Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Falcon 180B: Can It Run on Your Computer?

Sep 24, 2023 - kaitchup.substack.com
The article discusses how to run the Falcon 180B, a 180 billion parameter model, on consumer hardware. The author explains that the model, released by the Technology Innovation Institute (TII) of Abu-Dhabi, is pre-trained on 3.5 trillion tokens and is ranked first on the OpenLLM leaderboard. However, due to its size, it requires extensive computing resources and cannot run on a standard computer without upgrades and the use of a quantized version of the model.

The author suggests using the safetensors format, which saves memory and makes the model safer and faster to run. They also recommend using the device_map feature to split the model across different devices to maximize memory usage. To further reduce memory consumption, the author suggests quantizing Falcon 180B to a lower precision. They conclude that with quantization and 100 GB of memory, Falcon 180B can be run on a reasonably affordable computer. For faster inference or fine-tuning, a GPU like the RTX 4090 or RTX 3090 24GB is recommended.

Key takeaways:

  • The Technology Innovation Institute (TII) of Abu-Dhabi has released a new model, Falcon 180B, a 180 billion parameter model that has demonstrated superior performance and is ranked first on the OpenLLM leaderboard.
  • Running Falcon 180B on a standard computer can be challenging due to its size and the intensive computing required. However, it is possible to run it on consumer hardware by upgrading the computer and using a quantized version of the model.
  • Quantization of Falcon 180B to a lower precision, such as 4-bit precision, significantly reduces its memory consumption, making it possible to run the model on a reasonably affordable computer with 100 GB of memory.
  • For fast inference or fine-tuning, a GPU like the RTX 4090 or RTX 3090 24GB is recommended. Without a GPU, fine-tuning would be too slow, but inference is possible with a high-end CPU and software optimized for faster inference, such as llama.cpp.
View Full Article

Comments (0)

Be the first to comment!