TTT models might be the next frontier in generative AI

The article discusses the limitations of the current dominant AI architecture, the transformer, and the search for more efficient alternatives. Transformers, used in models like OpenAI’s Sora and Google’s Gemini, are facing computational challenges due to their inefficiency in processing large amounts of data, leading to increased power demand. A new architecture, test-time training (TTT), developed by researchers at Stanford, UC San Diego, UC Berkeley, and Meta, is proposed as a promising alternative. TTT models are claimed to process more data than transformers while consuming less compute power.

The TTT model replaces the transformer's "hidden state" with a machine learning model, which encodes processed data into representative variables called weights, keeping the model size constant regardless of the amount of data processed. However, it's too early to confirm if TTT models will supersede transformers. Other alternatives like state space models (SSMs) are also being explored by AI startups like Mistral and AI21 Labs. If successful, these new architectures could make generative AI more accessible and widespread.

Key takeaways:

Transformers, a form of AI that underpins many models, are facing technical roadblocks due to their inefficiency at processing vast amounts of data, leading to increased power demand.
A new architecture called test-time training (TTT) has been proposed by researchers from Stanford, UC San Diego, UC Berkeley, and Meta, which can process more data than transformers without consuming as much power.
The TTT model replaces the 'hidden state' of transformers with a machine learning model that encodes processed data into representative variables called weights, making it highly performant and efficient.
While TTT models could potentially supersede transformers, it's too early to say for certain. Other alternatives to transformers, such as state space models (SSMs), are also being explored by AI startups like Mistral and AI21 Labs.

TTT models might be the next frontier in generative AI | TechCrunch

Key takeaways:

Comments (0)

Newsletter