The next steps involve launching LLaVaVision. This includes cloning and setting up the environment, creating dummy certificates, and starting the server. The author notes that HTTPS is necessary for mobile video functionality. The server can be accessed from a mobile device using the machine IP. The author also provides the option to start a local tunnel with ngrok or localtunnel. The article concludes with acknowledgements and inspiration sources, including Fuzzy-Search/realtime-bakllava, Multimodal LLama.cpp, and llava-vl.github.io.
Key takeaways:
- LLaVaVision is a simple web app created with a llama.cpp/llava backend using ChatGPT, Copilot, and some minor help from the creator, @lxe.
- The app is inspired by Fuzzy-Search/realtime-bakllava and serves as a "Be My Eyes" tool.
- Setting up the app involves setting up the llama.cpp server, downloading models, starting the server, and launching LLaVaVision.
- HTTPS is required for mobile video functionality, and the server can be accessed from a mobile device using the machine IP.