38TB of data accidentally exposed by Microsoft AI researchers

Microsoft's AI research team accidentally exposed 38 terabytes of private data, including a disk backup of two employees' workstations, while publishing open-source training data on GitHub. The data included secrets, private keys, passwords, and over 30,000 internal Microsoft Teams messages. The researchers used an Azure feature called SAS tokens to share their files, but the link was configured to share the entire storage account, including the additional private files. This incident highlights the new risks organizations face when leveraging AI, as the large amounts of data handled by data scientists and engineers require additional security checks and safeguards.

The Wiz Research Team discovered the data exposure while scanning the internet for misconfigured storage containers. The exposed storage URL was taken from Microsoft’s GitHub repository and was configured to grant permissions on the entire storage account. The account contained sensitive personal data, including passwords to Microsoft services, secret keys, and internal Microsoft Teams messages from 359 Microsoft employees. The token was also misconfigured to allow “full control” permissions instead of read-only, meaning an attacker could view, delete, and overwrite existing files. The incident underscores the security risks associated with the use of SAS tokens and the need for improved security measures in the AI development process.

Key takeaways:

Microsoft's AI research team accidentally exposed 38 terabytes of private data, including employee workstation backups, passwords, and internal messages, due to misconfigured Azure SAS tokens.
The SAS tokens, which were intended to share open-source training data, were configured to share the entire storage account, highlighting the risks organizations face when handling large amounts of data for AI.
SAS tokens pose a security risk due to their ability to grant high access levels, their potential for infinite expiry, and the difficulty in tracking and revoking them.
Security recommendations include avoiding Account SAS for external sharing, using Service SAS with a Stored Access Policy or User Delegation SAS for time-limited sharing, creating dedicated storage accounts for external sharing, and using secret scanning tools to detect leaked or over-privileged SAS tokens.

38TB of data accidentally exposed by Microsoft AI researchers | Wiz Blog

Key takeaways:

Comments (0)

Newsletter