The article provides a detailed guide on how to install and use Paddler. It includes instructions on how to register llama.cpp instances, install Paddler, and run agents and the load balancer. It also mentions that Paddler has a dashboard that can be enabled to see the status of the agents. The roadmap for Paddler includes features like a basic load balancer, circuit breaker, OpenTelemetry observer, and integration with AWS Auto Scaling.
Key takeaways:
- Paddler is an open-source load balancer and reverse proxy designed specifically for optimizing servers running llama.cpp, maintaining a stateful load balancer that is aware of each server's available slots.
- Paddler uses agents to monitor the health of individual llama.cpp instances, providing feedback to the load balancer for optimal performance and supports the dynamic addition or removal of llama.cpp servers.
- The agent should be installed in the same host as llama.cpp and it needs a few pieces of information to connect to the llama.cpp instance and report the health status.
- Paddler's load balancer collects data from agents and exposes reverse proxy to the outside world, and it requires two sets of flags to listen for updates from agents and to be reached from the outside hosts.