However, OpenAI acknowledges potential issues with these new capabilities, such as the potential for misuse of synthetic voices and privacy concerns with image search. The company is taking a cautious approach, limiting the bot's ability to analyze and make direct statements about people, and controlling the use of the new text-to-speech model to prevent impersonation or fraud. As ChatGPT evolves into a multi-modal virtual assistant, OpenAI is grappling with the challenge of expanding its capabilities while mitigating potential problems.
Key takeaways:
- OpenAI is rolling out a new version of ChatGPT that allows users to prompt the AI bot by speaking aloud or uploading a picture, in addition to typing sentences.
- The company is utilizing its Whisper model for speech-to-text work and is introducing a new text-to-speech model that can generate human-like audio from text and a few seconds of sample speech.
- OpenAI is also introducing an image search feature, similar to Google Lens, where ChatGPT tries to understand what the user is asking about from a photo and responds accordingly.
- Despite these advancements, OpenAI has deliberately limited ChatGPT's ability to analyze and make direct statements about people for accuracy and privacy reasons, and is cautious about the potential misuse of these new capabilities.