The research comes amidst a broader battle between OpenAI and publishers, authors, and artists over the use of copyrighted material for AI training data. This includes a high-profile lawsuit between The New York Times and OpenAI. OpenAI has previously stated that it's "impossible" to train top AI models without copyrighted works, a claim that is now being challenged by this new research.
Key takeaways:
- Patronus AI, a company specializing in evaluation and testing for large language models, has released a tool called CopyrightCatcher to identify instances of AI models using copyrighted text in their responses.
- The company tested four leading AI models: OpenAI's GPT-4, Anthropic's Claude 2, Meta's Llama 2, and Mistral AI's Mixtral, and found that all models produced copyrighted content to varying degrees.
- OpenAI's GPT-4 was found to produce copyrighted content on 44% of prompts, the highest among the models tested.
- The research comes amid a broader battle between OpenAI and publishers, authors, and artists over the use of copyrighted material for AI training data, including a high-profile lawsuit between The New York Times and OpenAI.