Google Lets Publishers Opt Out of AI Training Data

Google has introduced a new tool, Google-Extended, that allows websites to be indexed by its search engine through crawlers like Googlebot. The tool also gives publishers the option to opt out of their data being used for training AI models. However, some critics, such as Alex Berger from Adform, have called this move "shady," arguing that Google is essentially commercializing publishers' content without their consent and then threatening to penalize them if they opt out.

The tool has put publishers in a difficult position, as blocking Google's crawlers could result in their content not appearing in search results, which is a key source of organic traffic and revenue. Some publishers, such as The New York Times, have updated their Terms of Service to prohibit the scraping of their content for machine learning or AI training. Others, like CNN and Reuters, have blocked OpenAI's web crawler to prevent their content from being used for data scraping and ChatGPT training.

Key takeaways:

Google's new tool, Google-Extended, allows websites to be indexed by its search engine and offers publishers the option to opt out of their data being used for training AI models.
Some industry professionals, like Alex Berger from Adform, have criticized this move, arguing that Google is essentially commercializing content without consent and potentially penalizing those who opt out.
Google-Extended can be accessed via the robots.txt file, but blocking Google's web crawlers could lead to a loss in search result visibility, impacting organic traffic and revenue.
Several publishers, including The New York Times, CNN, and Reuters, have taken legal measures or blocked web crawlers to prevent their content from being used to train AI systems without proper attribution.

Google Lets Publishers Opt Out of AI Training Data

Key takeaways:

Comments (0)

Newsletter