The company has released a "red teaming" report detailing the strengths and risks of GPT-4o. The report highlights that the AI model refuses to identify people based on their speech, declines to answer loaded questions, and blocks prompts for violent and sexually charged language. It also disallows certain categories of content, such as discussions relating to extremism and self-harm. OpenAI maintains that it has made the AI model safer through various mitigations and safeguards.
Key takeaways:
- OpenAI’s GPT-4o, the AI model that powers Advanced Voice Mode in ChatGPT, has been trained on voice, text, and image data, leading to some unusual behaviors such as mimicking the user's voice or randomly shouting.
- The company has implemented a "system-level mitigation" to prevent these behaviors and has also put measures in place to prevent the generation of inappropriate sound effects and nonverbal vocalizations.
- GPT-4o might infringe on music copyright, so OpenAI has instructed it not to sing and has implemented filters to prevent this. It's unclear whether these restrictions will be lifted when Advanced Voice Mode is rolled out to more users.
- OpenAI has made GPT-4o safer by updating text-based filters to work on audio conversations, training it to refuse requests for copyrighted content, and blocking prompts for violent and sexually charged language, among other measures.