Researchers find child sexual abuse images in LAION-5B AI training dataset

Researchers from the Stanford Internet Observatory (SIO) have discovered over 1,000 images of child sexual abuse in LAION-5B, an open-source AI training dataset used for building image generation models. The researchers used a data management technique called hashing to identify the illegal images, and have reported the image URLs to the National Center for Missing and Exploited Children (NCMEC) in the U.S. and the Canadian Centre for Child Protection (C3P). The dataset, released in early 2022 by a German nonprofit, contains over 5 billion images scraped from the web and has been used to train multiple image generation models, some of which have been found to generate child sexual abuse material.

The German nonprofit has since deleted multiple versions of the dataset from the internet and has released filters for finding and removing illegal content. One of the companies that used LAION-5B to train its neural networks, Stability AI Ltd., has stated that its recent 2.0 version of Stability Diffusion was trained on a subset of the dataset with less unsafe content. This is not the first time LAION-5B has faced scrutiny, as it was previously involved in a lawsuit for allegedly using copyrighted images and for containing photos of an artist's medical record.

Key takeaways:

Researchers from the Stanford Internet Observatory (SIO) have found over 1,000 child sexual abuse images in the LAION-5B AI training dataset.
The illegal images were identified using a data management technique called hashing, and the researchers have reported the image URLs to relevant authorities for removal.
LAION-5B, released by a German nonprofit in early 2022, comprises over 5 billion images scraped from the web and has been used to train multiple image generation models.
This is not the first time the LAION-5B dataset has come under scrutiny, with previous issues including a lawsuit over the use of copyrighted images and the discovery of an artist's medical records among the files.

Researchers find child sexual abuse images in LAION-5B AI training dataset - SiliconANGLE

Key takeaways:

Comments (0)

Newsletter