The announcement follows a recent investigation that found AI-powered search startup Perplexity had been ignoring requests not to scrape content, despite being blocked in the robots.txt file. Reddit's changes will not affect companies it has agreements with, such as Google, which has a $60 million deal to train its AI models on Reddit content. The update signals that companies wishing to use Reddit's data for AI training will need to pay. Reddit emphasized that anyone accessing its content must abide by its policies and that it is selective about who it grants large-scale access to its content.
Key takeaways:
- Reddit is updating its Robots Exclusion Protocol (robots.txt file) to control how automated web bots crawl its site, in response to AI companies using scraped data to train their models without acknowledging the source.
- Along with the updated robots.txt file, Reddit will continue rate-limiting and blocking unknown bots and crawlers that do not abide by Reddit’s Public Content Policy or have an agreement with the platform.
- The update is not expected to affect the majority of users or good faith actors, but is aimed at deterring AI companies from training their large language models on Reddit content without permission.
- Reddit has a $60 million deal with Google that allows the search giant to train its AI models on the social platform’s content, signaling that companies wishing to use Reddit’s data for AI training will need to pay.