ChatGPT now sees, hears, and speaks: adds new voice chat and image features

OpenAI has announced the addition of voice and image capabilities to its AI chatbot, ChatGPT, making it one of the most significant updates to the product. The voice feature, similar to Alexa, allows users to have back-and-forth conversations with ChatGPT, powered by large language models. The image feature, akin to Google Lens, enables users to show photos to ChatGPT for information or advice. These features will be rolled out to Plus and Enterprise customers initially, with plans for expansion to all users later.

The company has integrated its latest image generation model, DALL-E 3, into ChatGPT, enhancing its visual content creation capabilities. OpenAI's voice technology can create lifelike synthetic voices from a few seconds of speech, but to mitigate risks like impersonation and fraud, it's being used specifically for ChatGPT's voice chat. The company also mentioned rigorous testing to ensure responsible usage of its vision-based models.

Key takeaways

OpenAI is adding voice and image capabilities to its AI chatbot, ChatGPT, allowing it to see, hear, and speak, and enabling users to have conversations with images and speech.
The voice feature works like Alexa and there are five different voice options that users can choose from. The image feature lets users show ChatGPT photos to get information or advice.
The new features will be rolled out to Plus and Enterprise customers over the next two weeks, with plans for expansion to all users at a later date. The image feature will be accessible on all platforms, while voice functionality will be limited to iOS and Android.
OpenAI's latest voice technology can create lifelike synthetic voices from just a few seconds of speech, and its vision-based models have undergone rigorous testing for responsible usage.

ChatGPT now sees, hears, and speaks: adds new voice chat and image features

Key takeaways

Discussion (0)