Introducing PlayHT2.0: The state-of-the-art Generative Voice AI Model for Conversational Speech

PlayHT has introduced a new AI model, PlayHT2.0, which is designed to generate conversational speech and introduces the concept of emotions to Generative Voice AI. The model, which is currently in closed beta, has been trained on over 1 million hours of speech across multiple languages, accents, and speaking styles. It can generate human-like conversations, replicate voices with stunning accuracy from as little as 3 seconds of speech, and clone and generate voices in almost any language or accent. The model can also understand and apply emotions and talking styles to any voice in real time.

The PlayHT2.0 model is a significant improvement over the previous PlayHT1.0 model, which had limitations such as poor zero-shot capabilities, short speech generations, inability to control speech styles or emotions, and only worked in English. The new model is more robust, has reduced latency to conversational real-time levels, and can generate speech in less than 800ms. The model is now available through the PlayHT Studio and API in alpha, with major updates expected in the coming weeks.

Key takeaways

PlayHT has introduced a new Generative Text-to-Voice AI Model, PlayHT2.0, that can generate conversational speech and introduces the concept of Emotions to Generative Voice AI.
The new model has improved capabilities including real-time speech generation, instant voice cloning, cross-language and accent cloning, and directing emotions.
PlayHT2.0 was trained on a dataset of more than 1 million hours of speech across multiple languages, accents, and speaking styles, and can generate speech in less than 800ms.
The model is currently available in alpha through PlayHT's Studio and API, with major updates expected in the coming weeks to further improve its quality, speed, and capabilities.

Introducing PlayHT2.0: The state-of-the-art Generative Voice AI Model for Conversational Speech

Key takeaways

Discussion (0)