Researchers tested leading AI models for copyright infringement using popular books, and GPT-4 performed worst

Patronus AI, a company specializing in evaluation and testing for large language models, has found that leading AI models are often responding to user queries using copyrighted text. The company tested four models, including OpenAI's GPT-4, Anthropic's Claude 2, Meta's Llama 2, and Mistral AI's Mixtral, and found that all models produced copyrighted content. OpenAI's GPT-4 was found to produce copyrighted content on 44% of prompts.

The research comes amidst a broader battle between OpenAI and publishers, authors, and artists over the use of copyrighted material for AI training data. This includes a high-profile lawsuit between The New York Times and OpenAI. OpenAI has previously stated that it's "impossible" to train top AI models without copyrighted works, a claim that is now being challenged by this new research.

Key takeaways:

Patronus AI, a company specializing in evaluation and testing for large language models, has released a tool called CopyrightCatcher to identify instances of AI models using copyrighted text in their responses.
The company tested four leading AI models: OpenAI's GPT-4, Anthropic's Claude 2, Meta's Llama 2, and Mistral AI's Mixtral, and found that all models produced copyrighted content to varying degrees.
OpenAI's GPT-4 was found to produce copyrighted content on 44% of prompts, the highest among the models tested.
The research comes amid a broader battle between OpenAI and publishers, authors, and artists over the use of copyrighted material for AI training data, including a high-profile lawsuit between The New York Times and OpenAI.

Researchers tested leading AI models for copyright infringement using popular books, and GPT-4 performed worst

Key takeaways:

Comments (0)

Newsletter