Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Generative AI Has a Visual Plagiarism Problem

Jan 07, 2024 - spectrum.ieee.org
The article discusses the potential copyright issues associated with large language models (LLMs) like OpenAI's GPT-3 and Midjourney V6. The authors argue that these AI systems, trained on vast amounts of data, can sometimes reproduce copyrighted content, such as text from articles or images of trademarked characters, without the user's knowledge. This could potentially expose users to copyright infringement claims. The authors suggest that the only ethical solution is for AI developers to limit their training to data they have properly licensed and to be transparent about their data sources.

The authors also highlight the lack of transparency from AI developers about their training data and the potential for litigation. They argue that the burden of avoiding copyright infringement is unfairly placed on the user, as the AI systems do not provide any information about the provenance of the images they produce. The authors call for AI developers to document their data sources more carefully, restrict themselves to data that is properly licensed, include artists in the training data only if they consent, and compensate artists for their work.

Key takeaways:

  • Large language models (LLMs) like Google DeepMind and OpenAI's GPT-4 have been found to "memorize" and reproduce substantial chunks of text from their training sets, raising concerns about potential copyright infringement.
  • Generative AI systems like Midjourney V6 and OpenAI's DALL-E 3 have been found to produce near-verbatim or "plagiaristic" visual outputs based on copyrighted materials, even without direct prompts to do so.
  • These findings suggest that generative AI developers may be training their systems on copyrighted materials without proper licensing or transparency, potentially exposing users to copyright infringement claims.
  • The authors argue that the only ethical solution is for generative AI systems to limit their training to data they have properly licensed, and to be transparent about their data sources.
View Full Article

Comments (0)

Be the first to comment!