OpenAI upgrades its transcription and voice-generating AI models

OpenAI is introducing new transcription and voice-generating AI models to its API, enhancing its previous offerings. These models align with OpenAI's "agentic" vision of creating automated systems that can independently perform tasks for users. The new text-to-speech model, "gpt-4o-mini-tts," offers more realistic and steerable speech, allowing developers to customize how things are spoken. The updated speech-to-text models, "gpt-4o-transcribe" and "gpt-4o-mini-transcribe," replace the older Whisper model and are designed to better handle accented and varied speech, even in chaotic environments, while reducing inaccuracies.

Despite improvements, the new transcription models have limitations, particularly with Indic and Dravidian languages, where the word error rate can be as high as 30%. Unlike previous models, OpenAI does not plan to release these new transcription models openly, citing their size and complexity as reasons. The company aims to ensure thoughtful open-source releases that are well-suited for specific needs, particularly for end-user devices.

Key takeaways:

OpenAI is introducing new transcription and voice-generating AI models to its API, enhancing their previous releases.
The new text-to-speech model, "gpt-4o-mini-tts," offers more nuanced and realistic-sounding speech and is more "steerable" than previous models.
OpenAI's new speech-to-text models, "gpt-4o-transcribe" and "gpt-4o-mini-transcribe," are designed to replace the Whisper model, offering improved accuracy and reduced hallucinations.
OpenAI does not plan to make its new transcription models openly available, citing their size and the need for thoughtful open-source releases.

OpenAI upgrades its transcription and voice-generating AI models | TechCrunch

Key takeaways:

Comments (0)

Newsletter