The use of these images poses significant privacy risks and increases the risk of non-consensual AI-generated images bearing the children's likenesses. Despite LAION working with HRW to remove the links to the children's images in the dataset, the removed links are likely a significant undercount of the total amount of children’s personal data in LAION-5B. Furthermore, removing the links does not erase the images from the public web, where they can still be referenced and used in other AI datasets.
Key takeaways:
- Photos of Brazilian children, sometimes spanning their entire childhood, have been used without their consent to power AI tools, posing urgent privacy risks, according to Human Rights Watch (HRW).
- The photos were found in LAION-5B, a dataset built from Common Crawl snapshots of the public web, which includes image-text pairs derived from 5.85 billion images and captions posted online since 2008.
- LAION, the German nonprofit that created the dataset, has worked with HRW to remove the links to the children's images in the dataset, but concerns remain that the dataset may still reference personal photos of kids from around the world.
- In Brazil, at least 85 girls have reported harassment from classmates who used AI tools to create sexually explicit deepfakes based on photos taken from their social media profiles, causing lasting harm and potential lifelong online presence.