The article also provides detailed instructions on how to use `garak`, including how to specify a generator, run probes, and read the results. It also provides examples of probing different models for vulnerabilities. Additionally, it explains how to develop your own plugin for `garak` and how to cite the tool in research. The tool is open-source and the developers encourage contributions to add functionality and support applications.
Key takeaways:
- `garak` is a vulnerability scanner for Large Language Models (LLMs) developed by NVIDIA. It checks for potential weaknesses such as hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and more.
- The tool uses a combination of static, dynamic, and adaptive probes to identify potential vulnerabilities in an LLM or dialog system. It is free to use and the developers are open to adding more functionality to support applications.
- `garak` supports a wide range of models and can be installed via pip or cloned from the source. It also provides detailed logging, including a log file, a report of the current run, and a hit log detailing attempts that yielded a vulnerability.
- Developers can create their own plugins for `garak` by inheriting from one of the base classes and overriding as little as possible. The tool also provides a list of all the plugins of the type you're writing, with `--list_probes`, `--list_detectors`, or `--list_generators`.