Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Bluesky's Open API Means Anyone Can Scrape Your Data for AI Training. It's All Public - Slashdot

Dec 02, 2024 - tech.slashdot.org
Bluesky, a social media platform, has come under scrutiny after one million public posts, including user information, were crawled and uploaded to AI company Hugging Face, despite Bluesky's claim of not training AI on user data. The data was later removed and an apology was issued by the scraper at Hugging Face, acknowledging the violation of transparency and consent principles. TechCrunch highlighted that Bluesky's open API allows anyone to scrape user data for AI training, emphasizing the public nature of user posts.

In response, Bluesky stated it is exploring ways to allow users to express their consent preferences externally, but admitted it cannot enforce this consent outside its systems. The company is in discussions with engineers and lawyers to address this issue. Bluesky compared its platform to a website, suggesting that just as robots.txt files don't always prevent outside companies from crawling websites, the same applies to their platform. The incident sparked debate on whether data collection should be opt-in, and whether publicly available data can be considered fair use.

Key takeaways:

  • Despite Bluesky's claim of not training AI on user data, one million public Bluesky posts were crawled and uploaded to AI company Hugging Face, which later removed the data and apologized for the violation of transparency and consent principles.
  • TechCrunch pointed out that Bluesky's open API allows anyone to scrape user data for AI training, highlighting the public nature of all posts on the platform.
  • Bluesky is exploring ways to allow users to communicate their consent preferences externally, but admits it cannot enforce this consent outside of its own systems.
  • Bluesky's response to the incident was that it is similar to a website where robots.txt files don't always prevent outside companies from crawling those sites, leading to a debate on whether data collection should be opt-in or if public Bluesky data is fair use.
View Full Article

Comments (0)

Be the first to comment!