Introducing OCTAVE (Omni-Capable Text and Voice Engine) • Hume AI

OCTAVE (Omni-Capable Text and Voice Engine) is a next-generation speech-language model that combines the capabilities of existing models like EVI 2, OpenAI’s Voice Engine, Elevenlab’s TTS Voice Design, and Google Deepmind’s NotebookLM. It can generate voices and personalities from descriptive prompts or brief recordings, allowing for the creation of diverse characters with specific vocal traits and personalities. OCTAVE supports real-time interaction and can generate dialog for multiple interacting speakers, making it suitable for AI systems that require rich communication and detailed instruction-following capabilities.

Despite its advanced speech processing and generation features, OCTAVE maintains language understanding performance comparable to similar-sized frontier LLMs. The model is currently being evaluated for safety and effectiveness by trusted partners, with broader availability planned in the future. OCTAVE aims to enable more realistic and multifaceted AI experiences, allowing users and developers to craft and personalize AI personas for various applications, including real-time group conversations.

Key takeaways:

OCTAVE is a next-generation speech-language model that can generate voices and personalities from descriptive prompts or brief recordings, enabling rich and authentic communication.
The model can clone and adopt any speaker's voice and personality from a noisy recording as brief as 5 seconds, allowing for seamless voice and personality adoption.
OCTAVE can generate dialog for multiple interacting characters, switching among them in real-time, enhancing AI experiences with multifaceted interactions.
OCTAVE maintains comparable language understanding performance to similar-sized frontier LLMs, making it suitable for AI systems that require detailed instruction following and interface control.

Introducing OCTAVE (Omni-Capable Text and Voice Engine) • Hume AI

Key takeaways:

Comments (0)

Newsletter