How garak runs

In a typical run, garak will read a model type (and optionally model name) from the command line, then determine which probe and detector plugins to run, start up a generator, and then pass these to a harness to manage the probing; an evaluator deals with the results. There are many modules in each of these categories, and each module provides a number of classes that act as individual plugins.

  • garak/probes/ - classes for generating interactions with LLMs

  • garak/detectors/ - classes for detecting an LLM is exhibiting a given failure mode

  • garak/evaluators/ - assessment reporting schemes

  • garak/generators/ - plugins for LLMs to be probed

  • garak/harnesses/ - classes for structuring testing

  • garak/buffs - classes for augmenting / fuzzing attacks

  • resources/ - ancillary items required by plugins

The default operating mode is to use the garak.harnesses.probewise harness. Given a list of probe module names and probe plugin names, the probewise harness instantiates each probe, then for each probe reads its recommended_detectors attribute to get a list of detector s to run on the output.

Each plugin category (probes, detectors, evaluators, generators, harnesses) includes a which defines the base classes usable by plugins in that category. Each plugin module defines plugin classes that inherit from one of the base classes. For example, garak.generators.openai.OpenAIGenerator descends from garak.generators.base.Generator.

Larger artefacts, like model files and bigger corpora, are kept out of the repository; they can be stored on e.g. Hugging Face Hub and loaded locally by clients using garak.