OpenAI's New Image Generator Can Do Near-Perfect Text

OpenAI has introduced new image generation capabilities for ChatGPT, utilizing the GPT-4o model, which improves text rendering significantly. Previously, the chatbot relied on the DALL-E model for image creation. The new feature, "Images in ChatGPT," is also available in OpenAI's video generation tool, Sora. The GPT-4o model uses an autoregressive approach to produce images, enhancing text quality and photorealism, although it still struggles with small lettering and non-Latin scripts. The model is designed to follow instructions more accurately, but generating outputs takes longer, and it remains prone to hallucinations.

OpenAI has implemented robust safeguards to address safety and misinformation concerns, embedding C2PA metadata in generated images to identify them as AI-created. However, this metadata can be easily removed, especially on social media platforms. Currently, the image generation feature is exclusive to OpenAI's $200 per month Pro subscription, with plans to expand access to Plus and free users soon.

Key takeaways

OpenAI has introduced new image generation capabilities for ChatGPT, using the GPT-4o model, which improves text rendering and follows instructions better.
The GPT-4o model uses an autoregressive approach for image generation, unlike the diffusion model used by DALL-E, and is fine-tuned for more photorealistic images.
Despite improvements, the model still struggles with small lettering, non-Latin scripts, and may hallucinate information, raising safety and misinformation concerns.
Currently, GPT-4o image generation is available only to subscribers of OpenAI's $200 per month Pro subscription, with plans to expand access to more users in the future.

OpenAI's New Image Generator Can Do Near-Perfect Text

Key takeaways

Discussion (0)