Inception's DLMs reportedly leverage GPUs more efficiently, allowing them to run up to 10 times faster and at a tenth of the cost compared to traditional LLMs. The company claims that their "small" coding model matches the performance of OpenAI's GPT-4o mini while being over 10 times faster. Additionally, their "mini" model is said to outperform small open-source models like Meta's Llama 3.1 8B, achieving more than 1,000 tokens per second. This advancement is expected to significantly impact the way language models are built and deployed.
Key takeaways:
- Inception, founded by Stanford professor Stefano Ermon, has developed a diffusion-based large language model (DLM) that outperforms traditional LLMs in speed and efficiency.
- The DLMs developed by Inception can run up to 10 times faster and cost 10 times less than traditional LLMs, according to the company.
- Inception's models leverage GPUs more efficiently, potentially changing the way language models are built and deployed.
- The company's 'small' coding model is claimed to be as good as OpenAI's GPT-4o mini, while their 'mini' model outperforms Meta's Llama 3.1 8B, achieving more than 1,000 tokens per second.