The New York Times has sued OpenAI for its ChatGPT text model reproducing near-verbatim copies of the newspaper's paywalled articles, and similar claims have been made by book authors and software developers. OpenAI, which also profits from subscription fees, has defended itself, arguing that it collaborates with news organizations, that training on copyrighted data qualifies for fair use, and that any infringement is a "bug". However, the company has also warned that its models won't work without being trained on copyrighted content, a claim that has been met with skepticism and criticism.
Key takeaways:
- OpenAI has stated that it would be "impossible" to build top-tier neural networks without using copyrighted work, asserting that using out-of-copyright public domain material would result in sub-par AI software.
- A recent study has documented instances of "plagiaristic outputs" where AI services like OpenAI and DALL-E 3 render substantially similar versions of scenes from films, pictures of famous actors, and video game content, likely due to training on copyrighted material.
- OpenAI and Midjourney are being accused of producing materials that infringe on copyright and trademarks, and profiting from it through subscription fees, without informing users about potential infringement.
- OpenAI has defended its practices, arguing that training on copyrighted data qualifies for the fair use defense under copyright law, and that any "regurgitation" of copyrighted content is a rare bug that they are working to eliminate.