Cerebras has also announced a partnership with Qualcomm to use the latter's AI 100 processor for the inference process of generative AI, which involves making predictions on live traffic. The partnership applies four techniques to reduce the cost of inference, including sparsity, speculative decoding, output conversion into a compiled version, and network architecture search. This collaboration is expected to increase the number of tokens processed on the Qualcomm chip per dollar spent by an order of magnitude.
Key takeaways:
- Cerebras Systems has unveiled the Wafer Scale Engine 3 (WSE-3), the third generation of its AI chip and the world's largest semiconductor. The WSE-3 doubles the rate of instructions carried out, from 62.5 petaFLOPs to 125 petaFLOPs.
- The WSE-3 has shrunk its transistors from 7 nanometers to 5 nanometers, boosting the transistor count from 2.6 trillion transistors in WSE-2 to 4 trillion. The chip is manufactured by TSMC, the world's largest contract chipmaker.
- Cerebras' CS-3 computer with WSE-3 can handle a theoretical large language model of 24 trillion parameters, which is an order of magnitude more than top-of-the-line generative AI tools such as OpenAI's GPT-4.
- Cerebras has also unveiled a partnership with Qualcomm to use the latter's AI 100 processor for the inference process of generative AI, which consists of making predictions on live traffic. The partnership applies four techniques to reduce the cost of inference.