Someone Made a Dataset of One Million Bluesky Posts for 'Machine Learning Research'
Nov 27, 2024 - 404media.co
A machine learning librarian at Hugging Face has released a dataset of one million Bluesky posts for machine learning research. The dataset, announced by Daniel van Strien on Bluesky, includes text, metadata, language predictions, and information about media attachments and reply relationships from Bluesky's firehose API.
The dataset is intended for machine learning research and experimentation with social media data. Each post in the dataset contains text content, metadata, and information about media attachments and reply relationships.
Key takeaways:
A machine learning librarian at Hugging Face has released a dataset of one million Bluesky posts for machine learning research.
The dataset includes when the posts were made and who posted them.
Each post in the dataset contains text content, metadata, and information about media attachments and reply relationships.
Daniel van Strien posted about the dataset on Bluesky, providing more details about its content and purpose.