GitHub - lxe/llavavision: A simple "Be My Eyes" web app with a llama.cpp/llava backend

The article provides a detailed guide on how to set up and launch LLaVaVision, a simple web app inspired by Fuzzy-Search/realtime-bakllava. The app, created using ChatGPT, Copilot, and some assistance from the author @lxe, operates with a llama.cpp/llava backend. The initial steps involve setting up the llama.cpp server, which may require installing the CUDA toolkit, building llama.cpp, and downloading models from ggml_bakllava-1.

The next steps involve launching LLaVaVision. This includes cloning and setting up the environment, creating dummy certificates, and starting the server. The author notes that HTTPS is necessary for mobile video functionality. The server can be accessed from a mobile device using the machine IP. The author also provides the option to start a local tunnel with ngrok or localtunnel. The article concludes with acknowledgements and inspiration sources, including Fuzzy-Search/realtime-bakllava, Multimodal LLama.cpp, and llava-vl.github.io.

Key takeaways:

LLaVaVision is a simple web app created with a llama.cpp/llava backend using ChatGPT, Copilot, and some minor help from the creator, @lxe.
The app is inspired by Fuzzy-Search/realtime-bakllava and serves as a "Be My Eyes" tool.
Setting up the app involves setting up the llama.cpp server, downloading models, starting the server, and launching LLaVaVision.
HTTPS is required for mobile video functionality, and the server can be accessed from a mobile device using the machine IP.

GitHub - lxe/llavavision: A simple "Be My Eyes" web app with a llama.cpp/llava backend

Key takeaways:

Comments (0)

Newsletter