OpenELM, short for Open-source Efficient Language Models, is targeted at on-device applications, similar to rivals Google, Samsung, and Microsoft. The models were pre-trained on public datasets of 1.8 trillion tokens from sources like Reddit, Wikipedia, and arXiv.org. Despite not being bleeding-edge in performance, the models perform fairly well, especially the 450 million parameters instruct variant. The models are expected to improve in the long term and it will be interesting to see how they are utilized across different applications.
Key takeaways:
- Apple has released OpenELM, a new family of open-source large language models (LLMs) that can run entirely on a single device, on AI code community Hugging Face.
- There are eight OpenELM models in total – four pre-trained and four instruction-tuned – covering different parameter sizes between 270 million and 3 billion parameters.
- Apple is offering the weights of its OpenELM models under a “sample code license,” along with different checkpoints from training, stats on how the models perform as well as instructions for pre-training, evaluation, instruction tuning and parameter-efficient fine-tuning.
- In terms of performance, the OpenLLM results shared by Apple show that the models perform fairly well, especially the 450 million parameters instruct variant. The 1.1 billion OpenELM variant “outperforms OLMo, which has 1.2 billion parameters, by 2.36% while requiring 2× fewer pre-training tokens.”