Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Perplexity AI Is Lying about Their User Agent

Jun 15, 2024 - rknight.me
The author discusses his attempts to block AI bots, specifically PerplexityBot, from accessing and summarizing content on his website. Despite disallowing PerplexityBot in his robots.txt file and adding server-side blocking in nginx, he found that Perplexity was still able to access and summarize his posts. He tested his blocking setup using the user agent Perplexity claimed to use and confirmed that the blocking was functional. However, upon further investigation, he discovered that Perplexity was using a different user agent string and was using headless browsers to scrape content, thereby bypassing the robots.txt restrictions.

The author expresses frustration at the unethical behavior of AI companies like Perplexity, which ignore robots.txt and do not send their correct user agent, making it impossible to block them. He has reported the issue on Perplexity's Discord channel and is considering further action, possibly involving GDPR. He is adamant about not wanting his content to be freely used by AI companies.

Key takeaways:

  • The author has been trying to block AI bots from accessing their website, specifically a bot called PerplexityBot.
  • Despite setting up server-side blocking and disallowing the bot in the robots.txt file, the bot was still able to access and summarize content from the site.
  • The bot claims it does not have the ability to crawl websites or bypass robots.txt restrictions, yet it was found to be using a generic user agent string instead of identifying itself as 'PerplexityBot'.
  • The author is considering further action, such as a GDPR request, to prevent their content from being accessed by AI companies without permission.
View Full Article

Comments (0)

Be the first to comment!