OpenAI details Voice Engine speech generation AI

OpenAI has developed Voice Engine, an AI model capable of generating synthetic speech based on user-provided audio samples. The model, which powers ChatGPT features, can analyze a user's voice and generate a similar synthetic speech with just 15 seconds of audio. The model was made available to a limited number of partners in 2023, and has been used for tasks like generating voiceovers for educational content and translating videos. OpenAI has not yet made Voice Engine publicly available, but is considering its commercial potential.

The company is taking steps to ensure responsible use of the technology, including watermarking synthetic speech files and launching a proactive monitoring initiative. If made commercially available, Voice Engine could compete with existing synthetic speech services, such as Eleven Labs. However, OpenAI may need to develop a more advanced version of Voice Engine to offer more customization options, which could increase its price. In 2022, OpenAI released the code for a second AI system, Whisper, which can transcribe and translate speech.

Key takeaways

OpenAI has developed Voice Engine, an AI model that can generate synthetic speech based on user-provided audio samples, and uses it to power ChatGPT features.
Voice Engine can analyze a sample of a user’s voice and generate synthetic speech that closely resembles it, requiring only 15 seconds of audio to imitate the speaker.
OpenAI has not yet made Voice Engine publicly available but opened access to the model for a limited number of partners in late 2023, who have applied it to tasks such as generating voiceovers for educational content and translating videos.
If OpenAI decides to make Voice Engine commercially available, it could create more competition for the existing synthetic speech services on the market, such as Eleven Labs Inc.

OpenAI details Voice Engine speech generation AI - SiliconANGLE

Key takeaways

Discussion (0)