1-bit LLMs Could Solve AI’s Energy Demands

Researchers are developing ways to make large language models (LLMs) more efficient by reducing their size and energy consumption. One method is to round off the high-precision numbers that store their memories to just 1 or -1, a process known as quantization. Two approaches to this are post-training quantization (PTQ), which quantizes the parameters of a full-precision network, and quantization-aware training (QAT), which trains a network from scratch to have low-precision parameters. A team from ETH Zurich, Beihang University, and the University of Hong Kong introduced a PTQ method called BiLLM, which uses 1 bit for most parameters and 2 bits for the most influential weights.

Last year, a team from Microsoft Research Asia created BitNet, the first 1-bit QAT method for LLMs, which proved to be more energy efficient than PTQ methods. In February, they announced BitNet 1.58b, which uses parameters of -1, 0, or 1, making it faster and more energy efficient than full-precision networks. Meanwhile, a team from Harbin Institute of Technology developed a method called OneBit, which combines elements of both PTQ and QAT, and achieved good results with less memory usage. However, current hardware cannot fully utilize these models, and new hardware that can natively represent each parameter as a -1 or 1 (or 0) is needed.

Key takeaways:

Large language models (LLMs) are becoming increasingly larger and more energy-intensive, prompting researchers to find ways to make them smaller and more efficient. One method being explored is drastically rounding off the high-precision numbers that store their memories to just 1 or -1.
Two general approaches to this are post-training quantization (PTQ) and quantization-aware training (QAT). PTQ has been more popular among researchers, with a team introducing a PTQ method called BiLLM that uses 1 bit for most parameters and 2 bits for a few key weights.
Microsoft Research Asia has developed BitNet, the first 1-bit QAT method for LLMs, which is roughly 10 times as energy efficient as full-precision networks. A newer version, BitNet 1.58b, uses parameters that can equal -1, 0, or 1, making it even more efficient.
Quantized models have multiple advantages, including fitting on smaller chips, requiring less data transfer, and allowing for faster processing. However, current hardware can't fully utilize these models, suggesting a need for new hardware specifically optimized for 1-bit LLMs.

1-bit LLMs Could Solve AI’s Energy Demands

Key takeaways:

Comments (0)

Newsletter