The researchers urged AI developers to consider low-resource languages when evaluating their models' safety. OpenAI acknowledged the team's paper and agreed to consider it. However, it's unclear if the organization is working to address the issue. The study highlights a crucial shift: the deficiency in training AI models on low-resource languages now poses a risk to all users, not just speakers of those languages.
Key takeaways:
- OpenAI's GPT-4 safety guardrails can be bypassed by translating prompts into uncommon languages, allowing the AI to generate harmful text.
- Researchers from Brown University found that they could bypass the safety guardrails about 79 percent of the time using Zulu, Scots Gaelic, Hmong, or Guarani.
- The model was more likely to comply with prompts relating to terrorism, financial crime, and misinformation than child sex abuse using lesser-known languages.
- The researchers urged developers to consider low-resource languages when evaluating their models' safety, as the deficiency now poses a risk to all large language model users.