The tool has put publishers in a difficult position, as blocking Google's crawlers could result in their content not appearing in search results, which is a key source of organic traffic and revenue. Some publishers, such as The New York Times, have updated their Terms of Service to prohibit the scraping of their content for machine learning or AI training. Others, like CNN and Reuters, have blocked OpenAI's web crawler to prevent their content from being used for data scraping and ChatGPT training.
Key takeaways:
- Google's new tool, Google-Extended, allows websites to be indexed by its search engine and offers publishers the option to opt out of their data being used for training AI models.
- Some industry professionals, like Alex Berger from Adform, have criticized this move, arguing that Google is essentially commercializing content without consent and potentially penalizing those who opt out.
- Google-Extended can be accessed via the robots.txt file, but blocking Google's web crawlers could lead to a loss in search result visibility, impacting organic traffic and revenue.
- Several publishers, including The New York Times, CNN, and Reuters, have taken legal measures or blocked web crawlers to prevent their content from being used to train AI systems without proper attribution.