The GPT-4o system card also explains how voice imitation could occur. The model can synthesize almost any type of sound found in its training data, including sound effects and music, and can fundamentally imitate any voice based on a short audio clip. OpenAI guides this capability safely by providing an authorized voice sample at the beginning of a conversation. The company also uses another system to detect if the model is generating unauthorized audio, allowing the model to use only certain pre-selected voices.
Key takeaways:
- OpenAI's new GPT-4o AI model has been found to unintentionally imitate users' voices in rare instances during testing, a feature not intended in the Advanced Voice Mode.
- OpenAI has safeguards in place to prevent unauthorized voice imitation and is working on improving the safety architecture of the AI chatbot.
- The AI model can synthesize almost any type of sound found in its training data, including voices based on short audio clips, which is guided safely by providing an authorized voice sample.
- OpenAI uses a system to detect if the model is generating unauthorized audio and only allows the model to use certain pre-selected voices.