The article also introduces SmolTalk, the SFT dataset of SmolLM2, and Smol-tools, a collection of lightweight AI-powered tools built with LLaMA.cpp and small language models. It provides scripts for launching pre-training with nanotron and an example script to finetune SmolLM2 using TRL and PEFT. The models can be evaluated using lighteval, and more detailed evaluation of each model size can be found in the model cards in this collection. Lastly, it mentions the release of SmolTalk, the SFT dataset used for building SmolLM2 instruct models, created with distilabel.
Key takeaways:
- SmolLM2 is a family of compact language models available in three sizes: 135M, 360M, and 1.7B parameters, capable of solving a wide range of tasks and lightweight enough to run on-device.
- The models can be used with various tools such as transformers, trl, llama.cpp, MLX, and transformers.js, and are suitable for both heavy and light applications.
- Smol-tools is a collection of lightweight AI-powered tools built with LLaMA.cpp and small language models, designed to run locally without requiring expensive GPU resources.
- SmolTalk is the SFT dataset used for building SmolLM2 instruct models, created with distilabel, and the synthetic data pipelines can be checked and executed in distilabel_pipelines README.