GPU-Accelerated LLM on a $100 Orange Pi

The post discusses the successful implementation of GPU-accelerated Large Language Models (LLMs) on an affordable embedded device, the Orange Pi 5, using Machine Learning Compilation (MLC) techniques. The device, equipped with a Mali GPU, was able to run Llama2-7b at 2.5 tok/sec and RedPajama-3b at 5 tok/sec. A Llama-2 13b model was also run at 1.5 tok/sec on a 16GB version of the Orange Pi 5+.

The post also provides a step-by-step guide to replicate the process on an Orange Pi device. The authors note that while the current results are promising, there is room for improvement, particularly around integer-to-float conversions. Future work will focus on integrating LLMs into more affordable devices, refining software frameworks, and exploring applications such as smart home devices.

Key takeaways:

The post demonstrates the successful running of GPU-accelerated Large Language Models (LLMs) on an affordable embedded device, specifically the Orange Pi 5, using Machine Learning Compilation (MLC) techniques.
MLC is an emerging technology that compiles and optimizes machine learning workloads for various backends. It was used to compile LLMs onto Mali GPUs, reusing the existing compilation pipeline without any code optimizations.
The post provides a step-by-step guide for setting up and running LLMs on an Orange Pi device, using both Command Line Interface (CLI) and Python API.
The team's future work will focus on integrating LLMs into affordable devices, refining software frameworks, and exploring broader applications such as smart home devices.

GPU-Accelerated LLM on a $100 Orange Pi

Key takeaways:

Comments (0)

Newsletter