Boffins find asking ChatGPT to repeat key words can expose its training data

Researchers have discovered that the AI chatbot, ChatGPT, can be prompted to reveal snippets of its training data by asking it to repeat a single word repeatedly. This unusual method, known as a divergence attack, can result in the chatbot generating random text, some of which appears to be directly lifted from its training data, including potentially sensitive or private information. The team found that only about 3% of the generated text is memorized from its training data, and certain words are more effective than others in triggering this response.

The researchers disclosed their findings to OpenAI, the organization behind ChatGPT, but at the time of writing, the issue has not been patched. The team used about 10 TB of text data to match against the chatbot's outputs and found over 10,000 examples from ChatGPT's training dataset. They believe that it's possible to extract gigabytes of training data from the chatbot and hope their findings will encourage responsible model deployment in the future.

Key takeaways:

A team of researchers discovered that ChatGPT, a large language model, can be prompted to regurgitate snippets of text from its training data by asking it to repeat a single word over and over again.
The researchers found that some of the text generated by ChatGPT in this manner appears to be directly copied from real text that has previously been published, revealing traces of the resources it was trained on.
The team managed to extract various types of training data from ChatGPT using this method, including personal identifiable information, code, explicit content, account information, and abstracts from research papers.
The researchers disclosed their findings to OpenAI and published their results 90 days later, but at the time of writing, the issue has not been patched.

Boffins find asking ChatGPT to repeat key words can expose its training data

Key takeaways:

Comments (0)

Newsletter