The software has been extensively tested and has shown significant speed improvements, with speedups ranging from 2.37x to 6.17x on x86 CPUs and from 1.37x to 5.07x on ARM CPUs, across various model sizes. The code for this software is publicly available. The article was submitted by Shaoguang Mao and has been updated twice since its initial submission.
Key takeaways:
- Recent advances in 1-bit Large Language Models (LLMs), such as BitNet and BitNet b1.58, have improved the efficiency of LLMs in terms of speed and energy consumption.
- These developments have enabled local LLM deployment across a wide range of devices.
- A new software stack has been introduced, specifically designed to maximize the potential of 1-bit LLMs.
- Extensive experiments show that this software achieves significant speedups, ranging from 2.37x to 6.17x on x86 CPUs and from 1.37x to 5.07x on ARM CPUs, across various model sizes.