Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Reddit's upcoming changes attempt to safeguard the platform against AI crawlers | TechCrunch

Jun 25, 2024 - techcrunch.com
Reddit is updating its Robots Exclusion Protocol (robots.txt file) to limit and block unknown web bots from crawling its site. This move is in response to the rise of AI companies scraping websites to train their models without acknowledging the source. Reddit will continue to enforce its Public Content Policy, and bots that do not comply will be rate-limited or blocked. The update is not expected to affect most users or good faith actors like researchers and organizations such as the Internet Archive, but is aimed at deterring AI companies from using Reddit content for training large language models.

The announcement follows a recent investigation that found AI-powered search startup Perplexity had been ignoring requests not to scrape content, despite being blocked in the robots.txt file. Reddit's changes will not affect companies it has agreements with, such as Google, which has a $60 million deal to train its AI models on Reddit content. The update signals that companies wishing to use Reddit's data for AI training will need to pay. Reddit emphasized that anyone accessing its content must abide by its policies and that it is selective about who it grants large-scale access to its content.

Key takeaways:

  • Reddit is updating its Robots Exclusion Protocol (robots.txt file) to control how automated web bots crawl its site, in response to AI companies using scraped data to train their models without acknowledging the source.
  • Along with the updated robots.txt file, Reddit will continue rate-limiting and blocking unknown bots and crawlers that do not abide by Reddit’s Public Content Policy or have an agreement with the platform.
  • The update is not expected to affect the majority of users or good faith actors, but is aimed at deterring AI companies from training their large language models on Reddit content without permission.
  • Reddit has a $60 million deal with Google that allows the search giant to train its AI models on the social platform’s content, signaling that companies wishing to use Reddit’s data for AI training will need to pay.
View Full Article

Comments (0)

Be the first to comment!