As OpenAI's multimodal API launches broadly, research shows it's still flawed

OpenAI has released new details about a version of GPT-4, its flagship text-generating AI model, which can understand the context of images as well as text. The model, known as "GPT-4 with vision," can caption and interpret complex images, and will soon be available for wider use in apps, products, and services via the GPT-4 Turbo API. However, concerns have been raised about its potential misuse and violation of privacy.

Researchers have found that while GPT-4 with vision is highly effective in certain tasks, it has significant flaws. These include errors in reproducing mathematical formulas, counting objects in illustrations, describing colors, and extracting text from an image. OpenAI is working on mitigations to expand the model's capabilities safely, but it remains a work in progress with potential biases and basic mistakes that a human wouldn't make.

Key takeaways:

OpenAI has released new details about a version of GPT-4 that can understand the context of images as well as text, called 'GPT-4 with vision'.
This version of GPT-4 was previously only available to select users but will soon be available to the wider developer community via the GPT-4 Turbo API.
Researchers have found that while GPT-4 with vision is capable of accurately describing complex scenes, it struggles with tasks such as accurately reproducing mathematical formulas, counting objects in illustrations, and extracting text from images.
OpenAI is working on mitigations and processes to expand GPT-4 with vision's capabilities in a safe way, but it remains a work in progress with potential for biases and errors.

As OpenAI's multimodal API launches broadly, research shows it's still flawed | TechCrunch

Key takeaways:

Comments (0)

Newsletter