OpenAI's latest ChatGPT version tries to hide training on copyrighted material

OpenAI's language model, ChatGPT, is reportedly attempting to hide its training on copyrighted material, according to a research paper by AI scientists at ByteDance. The paper suggests that ChatGPT now tries to avoid responding to user prompts with exact phrases from copyrighted works, and disrupts outputs when attempts are made to extract copyrighted content. However, despite these efforts, the paper found that ChatGPT and other AI models still show copyrighted material, as they have been trained on large amounts of such data.

The paper refers to AI models responding with copyrighted material as "leakage" and suggests that users who prompt these models to show copyrighted work are "misusing" the technology. It also commends ChatGPT's efforts to hide the copyrighted work it was trained on as a positive example of how AI tools can protect copyrighted content in large language models by detecting maliciously designed prompts.

Key takeaways:

OpenAI's ChatGPT and other large language models (LLMs) have been trained on vast amounts of data, including copyrighted books, leading to increased scrutiny and lawsuits from authors.
OpenAI and other companies like Google, Meta, and Microsoft have stopped disclosing what data their AI models are trained on in response to the scrutiny.
A new research paper suggests that ChatGPT now tries to avoid responding to user prompts with exact phrasing from copyrighted works, but despite these efforts, it still showed copyrighted material.
The paper refers to AI models responding with copyrighted material as 'leakage' and suggests that users who prompt these models to show copyrighted work are 'misusing' the technology.

OpenAI's latest ChatGPT version tries to hide training on copyrighted material

Key takeaways:

Comments (0)

Newsletter