Cisco's research highlights the vulnerabilities introduced by fine-tuning LLMs, especially in sensitive domains like healthcare and law. The study reveals that fine-tuned models are significantly more likely to produce harmful outputs than base models. Additionally, the article addresses the threat of dataset poisoning, where attackers can inject malicious data into open-source training sets for as little as $60, and decomposition attacks that extract copyrighted content without triggering guardrails. The findings emphasize the need for stronger security measures and real-time visibility to protect against these evolving threats, as LLMs become a critical attack surface in enterprise environments.
Key takeaways:
- Weaponized large language models (LLMs) like FraudGPT, GhostGPT, and DarkGPT are being used for cyberattacks, with leasing prices as low as $75 a month, and are packaged similarly to legitimate SaaS applications.
- Fine-tuning LLMs increases their vulnerability to attacks, as it weakens safety controls and makes them more susceptible to producing harmful outputs, with a 22-fold increase in risk compared to base models.
- Data poisoning attacks can be executed for as little as $60, allowing adversaries to inject malicious data into open-source training sets, potentially influencing downstream LLMs and compromising AI supply chains.
- Decomposition attacks can extract copyrighted and regulated content from LLMs without triggering guardrails, posing significant compliance risks for enterprises in regulated sectors like healthcare, finance, and legal.