Meet two open source challengers to OpenAI's 'multimodal' GPT-4V

OpenAI's GPT-4V, a multimodal model that can understand both text and images, has been hailed as a significant advancement in AI. However, it also presents new risks, such as the potential misuse of identifying people in images without their consent. Despite these risks, other companies and independent developers are releasing open-source multimodal models, such as Llava-1.5 by a team from the University of Wisconsin-Madison, Microsoft Research, and Columbia University, and Fuyu-8B by startup Adept. These models can perform many of the same tasks as GPT-4V, but they also have their own limitations and potential for misuse.

Llava-1.5, an improved version of Llava, combines a visual encoder and an open-source chatbot to understand images and text. However, it struggles with complex images and text recognition. Adept's Fuyu-8B is designed to understand unstructured data like charts, graphs, and screens. However, it lacks built-in moderation mechanisms or prompt injection guardrails, raising concerns about potential misuse. Despite these challenges, the trend towards open-sourcing multimodal models continues.

Key takeaways:

OpenAI's GPT-4V is a multimodal model that can understand both text and images, but it has been criticized for its inability to recognize hate symbols and its tendency to discriminate against certain demographics.
Despite these issues, other companies and independent developers are releasing open source multimodal models, such as Llava-1.5 by a team of researchers from the University of Wisconsin-Madison, Microsoft Research and Columbia University, and Fuyu-8B by Adept.
Llava-1.5 has shown promise in understanding images and their context, but struggles with recognizing text and interpreting complex images. Its use is also restricted for commercial purposes due to its training data being generated by ChatGPT.
Adept's Fuyu-8B is designed to understand unstructured data such as charts, diagrams, and software interfaces. However, it lacks built-in moderation mechanisms or prompt injection guardrails, raising concerns about potential misuse.

Meet two open source challengers to OpenAI's 'multimodal' GPT-4V | TechCrunch

Key takeaways:

Comments (0)

Newsletter