OpenAI: 'Impossible to train today’s leading AI models without using copyrighted materials'

OpenAI has stated that it would be "impossible" to develop top-tier neural networks without using copyrighted material, asserting that using public domain material would result in substandard AI software. This comes amid growing concerns about AI's interaction with copyright law, with a recent IEEE report suggesting that AI services like OpenAI's DALL-E 3 can recreate copyrighted scenes from films and video games based on their training data. The report's authors argue that it's likely these AI models were trained on copyrighted material, raising questions about the legality of this practice and potential liability for AI vendors and their customers.

The New York Times has sued OpenAI for its ChatGPT text model reproducing near-verbatim copies of the newspaper's paywalled articles, and similar claims have been made by book authors and software developers. OpenAI, which also profits from subscription fees, has defended itself, arguing that it collaborates with news organizations, that training on copyrighted data qualifies for fair use, and that any infringement is a "bug". However, the company has also warned that its models won't work without being trained on copyrighted content, a claim that has been met with skepticism and criticism.

Key takeaways:

OpenAI has stated that it would be "impossible" to build top-tier neural networks without using copyrighted work, asserting that using out-of-copyright public domain material would result in sub-par AI software.
A recent study has documented instances of "plagiaristic outputs" where AI services like OpenAI and DALL-E 3 render substantially similar versions of scenes from films, pictures of famous actors, and video game content, likely due to training on copyrighted material.
OpenAI and Midjourney are being accused of producing materials that infringe on copyright and trademarks, and profiting from it through subscription fees, without informing users about potential infringement.
OpenAI has defended its practices, arguing that training on copyrighted data qualifies for the fair use defense under copyright law, and that any "regurgitation" of copyrighted content is a rare bug that they are working to eliminate.

OpenAI: 'Impossible to train today’s leading AI models without using copyrighted materials'

Key takeaways:

Comments (0)

Newsletter