Researchers run high-performing large language model on the energy needed to power a lightbulb

Researchers from UC Santa Cruz have developed a method to run large language models more efficiently by eliminating matrix multiplication, the most computationally expensive element. They achieved this by forcing all numbers within the matrices to be ternary, reducing computation to summing numbers rather than multiplying, and adjusting the strategy of how matrices communicate with each other. The researchers also built custom hardware to further optimize energy savings. The new model, which is open source, performs at the same level as state-of-the-art models but with significantly less energy and monetary costs.

The researchers' model uses about 10 times less memory and operates about 25% faster than other models on standard GPUs. The team also created a prototype of their custom hardware on a field-programmable gate array (FPGA), which allowed them to fully utilize the energy-saving features they programmed into the neural network. This custom hardware enabled the model to surpass human-readable throughput on just 13 watts of power, more than 50 times the efficiency of GPUs. The researchers believe that with further development, the technology can be further optimized for even more energy efficiency.

Key takeaways:

Researchers from UC Santa Cruz have developed a method to eliminate the most energy-consuming element of running large language models, matrix multiplication, while maintaining performance.
The new method involves forcing all numbers within the matrices to be ternary, reducing computation to summing numbers rather than multiplying, and adjusting the strategy of how matrices communicate with each other.
With this approach and custom hardware, they were able to power a billion-parameter-scale language model on just 13 watts, more than 50 times more efficient than typical hardware.
Despite the reduced energy consumption, the new, open-source model achieves the same performance as state-of-the-art models like Meta’s Llama.

Researchers run high-performing large language model on the energy needed to power a lightbulb

Key takeaways:

Comments (0)

Newsletter