Stable Diffusion 3.0 debuts new diffusion transformation architecture to reinvent text-to-image gen AI

Stability AI has released an early preview of its next-generation flagship text-to-image generative AI model, Stable Diffusion 3.0. The new model aims to offer improved image quality and better performance in generating images from multi-subject prompts, with significantly improved typography. The model is based on a new architecture, a diffusion transformer, similar to the one used in OpenAI's Sora model. It will be available in multiple model sizes ranging from 800M to 8B parameters.

Stable Diffusion 3.0 uses diffusion transformers and flow matching for image generation, replacing the commonly used U-Net backbone with a transformer operating on latent image patches. This approach allows for more efficient use of compute and outperforms other forms of diffusion image generation. The model also benefits from flow matching, a new method for training Continuous Normalizing Flows (CNFs) to model complex data distributions. The company is also working on 3D image and video generation capabilities.

Key takeaways:

Stability AI is previewing its Stable Diffusion 3.0, a next-generation flagship text-to-image generative AI model that aims to provide improved image quality and better performance in generating images from multi-subject prompts.
The new model will also offer significantly better typography than prior Stable Diffusion models, enabling more accurate and consistent spelling inside of generated images.
Stable Diffusion 3.0 is based on a new architecture, the diffusion transformer, similar to the one used in the recent OpenAI Sora model.
Stability AI is also working on 3D image generation and video generation capabilities, with the aim of creating open models that can be used anywhere and adapted to any need.

Stable Diffusion 3.0 debuts new diffusion transformation architecture to reinvent text-to-image gen AI

Key takeaways:

Comments (0)

Newsletter