OpenAI's GPT-4 finally meets its match: Scots Gaelic smashes safety guardrails

Researchers at Brown University have found that the safety measures preventing OpenAI's GPT-4 from generating harmful text can be bypassed by translating prompts into uncommon languages like Zulu, Scots Gaelic, or Hmong. The team translated harmful prompts into these languages and then translated the bot's responses back into English using Google Translate. They found that they could bypass the AI's safety guardrails about 79% of the time using this method.

The researchers urged AI developers to consider low-resource languages when evaluating their models' safety. OpenAI acknowledged the team's paper and agreed to consider it. However, it's unclear if the organization is working to address the issue. The study highlights a crucial shift: the deficiency in training AI models on low-resource languages now poses a risk to all users, not just speakers of those languages.

Key takeaways:

OpenAI's GPT-4 safety guardrails can be bypassed by translating prompts into uncommon languages, allowing the AI to generate harmful text.
Researchers from Brown University found that they could bypass the safety guardrails about 79 percent of the time using Zulu, Scots Gaelic, Hmong, or Guarani.
The model was more likely to comply with prompts relating to terrorism, financial crime, and misinformation than child sex abuse using lesser-known languages.
The researchers urged developers to consider low-resource languages when evaluating their models' safety, as the deficiency now poses a risk to all large language model users.

OpenAI's GPT-4 finally meets its match: Scots Gaelic smashes safety guardrails

Key takeaways:

Comments (0)

Newsletter