Ask HN: Which LLMs can run locally on most consumer computers

The markdown data is a discussion about the feasibility and efficiency of running large language models (LLMs) locally, particularly in the context of gaming. The participants discuss the challenges of running LLMs at the edge, noting that while it's possible on most hardware, it's not ideal due to latency and throughput expectations. They suggest that it's more viable to centralize the inference in a distributed cloud environment. However, they also mention that there are generic instruction models like llama 3 8b and mistral 7b that could run locally for many users.

The discussion also covers the possibility of packaging these models with existing games to run locally, which would eliminate inference costs. Some participants share their experiences of running models like Mistral 7b and Llama 3 locally and getting decent results. There are also suggestions about using smaller models that can run reliably and quickly on consumer hardware. However, they note that this would require testing and might face issues with GPU vendors and software versions.

Key takeaways:

Running large language models (LLMs) at the edge is possible on most hardware but not ideal due to latency and throughput expectations, especially without a GPU.
Centralizing the inference in a distributed cloud environment off-device is most viable for LLMs.
Models like Llama 3 8b and Mistral 7b can run locally for many users and can be fine-tuned for specific use cases.
Packaging these models with existing games to run locally could potentially eliminate inference costs, but the feasibility and performance would need to be tested.

Ask HN: Which LLMs can run locally on most consumer computers

Key takeaways:

Comments (0)

Newsletter