ChatGPT can now see, hear, and speak

OpenAI is introducing new voice and image capabilities to ChatGPT, allowing users to engage in voice conversations and show images to the AI for discussion. The voice feature, powered by a new text-to-speech model, will be available on iOS and Android, while the image feature will be available on all platforms. Users can opt into these features in their settings, and they will be rolled out to Plus and Enterprise users over the next two weeks.

The new features present both opportunities and challenges. The voice technology can be used for creative and accessibility-focused applications, but also poses risks such as potential misuse by malicious actors. The image input feature, powered by multimodal GPT-3.5 and GPT-4, can assist users in a variety of tasks, but also has limitations, particularly in high-stakes domains and non-English languages. OpenAI is taking measures to ensure the safety and responsible usage of these features, and plans to expand access to other user groups soon.

Key takeaways

ChatGPT is introducing new voice and image capabilities, allowing users to have voice conversations and show images to the AI for more interactive discussions.
Voice and image features will be rolled out to Plus and Enterprise users over the next two weeks, with voice available on iOS and Android, and images available on all platforms.
The new voice capability is powered by a text-to-speech model and uses Whisper, an open-source speech recognition system, while image understanding is powered by multimodal GPT-3.5 and GPT-4.
OpenAI is deploying these new capabilities gradually to ensure safety and beneficial use, acknowledging potential risks such as impersonation and fraud with voice technology, and challenges with image interpretation in high-stakes domains.

ChatGPT can now see, hear, and speak

Key takeaways

Discussion (0)