Researchers have found that while GPT-4 with vision is highly effective in certain tasks, it has significant flaws. These include errors in reproducing mathematical formulas, counting objects in illustrations, describing colors, and extracting text from an image. OpenAI is working on mitigations to expand the model's capabilities safely, but it remains a work in progress with potential biases and basic mistakes that a human wouldn't make.
Key takeaways:
- OpenAI has released new details about a version of GPT-4 that can understand the context of images as well as text, called 'GPT-4 with vision'.
- This version of GPT-4 was previously only available to select users but will soon be available to the wider developer community via the GPT-4 Turbo API.
- Researchers have found that while GPT-4 with vision is capable of accurately describing complex scenes, it struggles with tasks such as accurately reproducing mathematical formulas, counting objects in illustrations, and extracting text from images.
- OpenAI is working on mitigations and processes to expand GPT-4 with vision's capabilities in a safe way, but it remains a work in progress with potential for biases and errors.