ChatGPT is bad at breaking copyright law, researchers say

Patronus AI, a startup focused on evaluating and testing language learning models (LLMs), has released a tool called CopyrightCatcher to detect potential copyright violations in LLMs. The company evaluated four major AI models, OpenAI’s GPT-4, Anthropic’s Claude 2.1, Mistral’s Mixtral, and Meta’s Llama 2, and found that they generate copyrighted content at an alarmingly high rate. GPT-4, the most advanced version of ChatGPT, generated the most copyrighted content at 44%.

The models were tested using books under copyright protection, and the results showed that GPT-4 completed book texts 60% of the time, and generated the first passage 26% of the time. Mixtral and Llama generated the first passage of books when prompted 38% and 10% of the time, respectively. Patronus AI stressed the importance of catching these mistakes to avoid legal action and risks to a company’s reputation.

Key takeaways

Research by Patronus AI found that some of the top AI models generate copyrighted content at an alarmingly high rate.
The AI models evaluated were OpenAI’s GPT-4, Anthropic’s Claude 2.1, Mistral’s Mixtral, and Meta’s Llama 2, with GPT-4 generating the most copyrighted content at 44%.
Patronus AI tested the models using books under copyright protection and found that some generations can be covered by fair use laws in the U.S.
Patronus AI emphasized the importance of catching these copyright violations to avoid legal action and risks to a company’s reputation.

ChatGPT is bad at breaking copyright law, researchers say

Key takeaways

Discussion (0)