Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Apple releases OpenELM: small, open source AI models designed to run on-device

May 14, 2024 - news.bensbites.com
Apple has released OpenELM, a new family of open-source large language models (LLMs) that can run entirely on a single device, on AI code community Hugging Face. The OpenELM models, which range in size from 270 million to 3 billion parameters, are designed to perform text generation tasks efficiently. The models are available under a "sample code license" and come with training checkpoints, performance stats, and instructions for pre-training, evaluation, and tuning.

OpenELM, short for Open-source Efficient Language Models, is targeted at on-device applications, similar to rivals Google, Samsung, and Microsoft. The models were pre-trained on public datasets of 1.8 trillion tokens from sources like Reddit, Wikipedia, and arXiv.org. Despite not being bleeding-edge in performance, the models perform fairly well, especially the 450 million parameters instruct variant. The models are expected to improve in the long term and it will be interesting to see how they are utilized across different applications.

Key takeaways:

  • Apple has released OpenELM, a new family of open-source large language models (LLMs) that can run entirely on a single device, on AI code community Hugging Face.
  • There are eight OpenELM models in total – four pre-trained and four instruction-tuned – covering different parameter sizes between 270 million and 3 billion parameters.
  • Apple is offering the weights of its OpenELM models under a “sample code license,” along with different checkpoints from training, stats on how the models perform as well as instructions for pre-training, evaluation, instruction tuning and parameter-efficient fine-tuning.
  • In terms of performance, the OpenLLM results shared by Apple show that the models perform fairly well, especially the 450 million parameters instruct variant. The 1.1 billion OpenELM variant “outperforms OLMo, which has 1.2 billion parameters, by 2.36% while requiring 2× fewer pre-training tokens.”
View Full Article

Comments (0)

Be the first to comment!