Petals – Decentralized platform for running large language models

The article introduces a new way to run large language models at home, similar to the BitTorrent style. Users can run large language models like Llama 2, Stable Beluga 2, Guanaco-65B, or BLOOM-176B collaboratively, where they load a small part of the model and join others to run inference or fine-tuning. The system allows single-batch inference to run at up to 6 steps/sec for Llama 2 and around 1 step/sec for BLOOM, which is up to 10 times faster than offloading, making it suitable for building chatbots and interactive apps.

The system goes beyond classic language model APIs, allowing users to employ any fine-tuning and sampling methods, execute custom paths through the model, or see its hidden states. It combines the comforts of an API with the flexibility of PyTorch. The project is part of the BigScience research workshop and has been featured on TechCrunch. Users can join the Discord or subscribe via email to follow the development of the project.

Key takeaways:

Large language models like Llama 2, Stable Beluga 2, Guanaco-65B, or BLOOM-176B can be run collaboratively, with each participant loading a small part of the model.
Single-batch inference runs at up to 6 steps/sec for Llama 2 and around 1 step/sec for BLOOM, which is up to 10x faster than offloading, enabling the creation of chatbots and interactive apps.
The system goes beyond classic language model APIs, allowing any fine-tuning and sampling methods, executing custom paths through the model, or viewing its hidden states.
The project is a part of the BigScience research workshop and offers the comforts of an API with the flexibility of PyTorch.

Petals – Decentralized platform for running large language models

Key takeaways:

Comments (0)

Newsletter