Patronus AI finds ‘alarming’ safety gaps in leading AI systems

Patronus AI, a startup focused on responsible AI deployment, has launched a diagnostic test suite called SimpleSafetyTests to identify safety risks in large language models (LLMs). The test suite includes 100 prompts designed to probe vulnerabilities in areas such as suicide, child abuse, and physical harm. In trials, over 20% of responses from 11 popular open-source LLMs were found to be unsafe. The company suggests that adding a safety-emphasizing system prompt can reduce unsafe responses, but additional safeguards may be needed for production systems.

The SimpleSafetyTests analysis revealed significant variability across different language models. Meta’s Llama2 model generated zero unsafe responses, while other leading models like Anthropic’s Claude and Google’s PaLM faltered on over 20% of test cases. Patronus AI, which offers AI safety testing and mitigation services, emphasizes the need for rigorous safety solutions before deploying LLMs in real-world applications. The company believes that diagnostic tools like SimpleSafetyTests will be essential for ensuring the safety and quality of AI products and services.

Key takeaways:

Patronus AI has released a new diagnostic test suite called SimpleSafetyTests to identify critical safety risks in large language models (LLMs).
The SimpleSafetyTests tool uses 100 handcrafted test prompts to probe AI systems for safety risks in areas such as self-harm, physical harm, illegal items, fraud, and child abuse.
The tests revealed critical weaknesses in several popular open-source LLMs, with over 20% unsafe responses in many models. However, Meta's Llama2 model showed flawless performance, generating zero unsafe responses.
Patronus AI offers AI safety testing and mitigation services to enterprises, and believes that diagnostic tools like SimpleSafetyTests will be essential for ensuring the safety and quality of AI products and services.

Patronus AI finds ‘alarming’ safety gaps in leading AI systems

Key takeaways:

Comments (0)

Newsletter