You can now prompt ChatGPT with pictures and voice commands

OpenAI is updating its AI-powered bot, ChatGPT, to allow users to interact with it by speaking aloud or uploading a picture, in addition to typing. The voice chat feature will convert spoken questions to text, feed it to the language model, and convert the response back to speech. The image search feature, similar to Google Lens, will attempt to understand and respond to queries based on uploaded images. These features will first be available to paying customers, and then to everyone else.

However, OpenAI acknowledges potential issues with these new capabilities, such as the potential for misuse of synthetic voices and privacy concerns with image search. The company is taking a cautious approach, limiting the bot's ability to analyze and make direct statements about people, and controlling the use of the new text-to-speech model to prevent impersonation or fraud. As ChatGPT evolves into a multi-modal virtual assistant, OpenAI is grappling with the challenge of expanding its capabilities while mitigating potential problems.

Key takeaways:

OpenAI is rolling out a new version of ChatGPT that allows users to prompt the AI bot by speaking aloud or uploading a picture, in addition to typing sentences.
The company is utilizing its Whisper model for speech-to-text work and is introducing a new text-to-speech model that can generate human-like audio from text and a few seconds of sample speech.
OpenAI is also introducing an image search feature, similar to Google Lens, where ChatGPT tries to understand what the user is asking about from a photo and responds accordingly.
Despite these advancements, OpenAI has deliberately limited ChatGPT's ability to analyze and make direct statements about people for accuracy and privacy reasons, and is cautious about the potential misuse of these new capabilities.

You can now prompt ChatGPT with pictures and voice commands

Key takeaways:

Comments (0)

Newsletter