Diffusion transformers are the key behind OpenAI's Sora -- and they're set to upend GenAI

OpenAI's Sora, a GenAI model that can generate videos and interactive 3D environments, utilizes an AI model architecture known as the diffusion transformer. This technology, which also powers Stability AI’s image generator, Stable Diffusion 3.0, is expected to revolutionize the GenAI field by enabling models to scale up beyond previous limits. The diffusion transformer was developed by Saining Xie, a computer science professor at NYU, and William Peebles, co-lead of Sora at OpenAI, by combining two machine learning concepts: diffusion and the transformer.

Diffusion transformers are considered a significant upgrade to the diffusion process, replacing the complex U-Net backbone with a simpler, more efficient architecture. Transformers are known for their "attention mechanism," which weighs the relevance of every input and draws from them to generate the output. This makes the architecture parallelizable, allowing for larger models to be trained with manageable increases in compute. Despite the technology being around for a while, its importance was only recently realized, leading to its implementation in projects like Sora and Stable Diffusion.

Key takeaways:

The diffusion transformer, an AI model architecture, is set to transform the GenAI field by enabling GenAI models to scale up beyond what was previously possible. It was developed by Saining Xie and William Peebles and is used in OpenAI's Sora and Stability AI's Stable Diffusion 3.0.
Diffusion transformers replace the U-Net backbone in diffusion models, delivering an efficiency and performance boost. They are simpler and more parallelizable than other model architectures, allowing for larger models to be trained with manageable increases in compute.
The diffusion transformer's importance as a scalable backbone model has only been recognized recently. It should be a simple swap-in for existing diffusion models, whether they generate images, videos, audio, or other forms of media.
Saining Xie envisions a future where the domains of content understanding and creation are integrated within the framework of diffusion transformers. He believes this integration requires the standardization of underlying architectures, with transformers being an ideal candidate.

Diffusion transformers are the key behind OpenAI's Sora -- and they're set to upend GenAI | TechCrunch

Key takeaways:

Comments (0)

Newsletter