Despite improvements, the new transcription models have limitations, particularly with Indic and Dravidian languages, where the word error rate can be as high as 30%. Unlike previous models, OpenAI does not plan to release these new transcription models openly, citing their size and complexity as reasons. The company aims to ensure thoughtful open-source releases that are well-suited for specific needs, particularly for end-user devices.
Key takeaways:
- OpenAI is introducing new transcription and voice-generating AI models to its API, enhancing their previous releases.
- The new text-to-speech model, "gpt-4o-mini-tts," offers more nuanced and realistic-sounding speech and is more "steerable" than previous models.
- OpenAI's new speech-to-text models, "gpt-4o-transcribe" and "gpt-4o-mini-transcribe," are designed to replace the Whisper model, offering improved accuracy and reduced hallucinations.
- OpenAI does not plan to make its new transcription models openly available, citing their size and the need for thoughtful open-source releases.