GitHub - coqui-ai/TTS: πΈπ¬ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Jun 11, 2024 - github.com
Coqui.ai has announced several updates to its Text-to-Speech (TTS) technology. The TTSv2 now supports 16 languages and offers improved performance. The TTS fine-tuning code has been released, and users can check the example recipes. The TTS can now stream with less than 200ms latency. The company has also released a production TTS model that can speak 13 languages. Other updates include the availability of Bark for inference with unconstrained voice cloning, support for approximately 1100 Fairseq models with TTS, and faster inference with Tortoise.
The company provides several resources for users, including documentation, installation guides, and a roadmap. It also offers a Python API for running multi-speaker and multi-lingual models, single speaker models, voice conversion, and voice cloning. Users can also use the command-line interface for synthesizing speech. The company has also provided a docker image for users who want to try TTS without installing it. The TTS technology has been tested on Ubuntu 18.04 with python versions 3.9 to 3.12.
Key takeaways:
Coqui.ai has released TTSv2 with 16 languages and improved performance.
TTS can now stream with less than 200ms latency.
Bark is now available for inference with unconstrained voice cloning.
Coqui.ai's TTS now supports Tortoise with faster inference.