Top-level concepts in garak

What are we doing here, and how does it all fit together? Our goal is to test the security of something that takes prompts and returns text. garak has a few constructs used to simplify and organise this process.

generators

<generators> wrap a target LLM or dialogue system. They take a prompt and return the output. The rest is abstracted away. Generator classes deal with things like authentication, loading, connection management, backoff, and all the behind-the-scenes things that need to happen to get that prompt/response interaction working.

probes

<probes> tries to exploit a weakness and elicit a failure. The probe manages all the interaction with the generator. It determines how often to prompt, and what the content of the prompts is. Interaction between probes and generators is mediated in an object called an attempt.

attempt

An <attempt> represents one unique try at breaking the target. A probe wraps up each of its adversarial interactions in an attempt object, and passes this to the generator. The generator adds responses into the attempt and sends the attempt back. This is logged in garak reporting which contains (among other things) JSON dumps of attempts.

Once the probe is done with the attempt and the generator has added its outputs, the outputs are examined for signs of failures. This is done in a detector.

detectors

<detectors> attempt to identify a single failure mode. This could be for example some unsafe contact, or failure to refuse a request. Detectors do this by examining outputs that are stored in a prompt, looking for a certain phenomenon. This could be a lack of refusal, or continuation of a string in a certain way, or decoding an encoded prompt, for example.

buffs

<buffs> adjust prompts before they’re sent to a generator. This could involve translating them to another language, or adding paraphrases for probes that have only a few, static prompts.

evaluators

When detectors have added judgments to attempts, <evaluators> converts the results to an object containing pass/fail data for a specific probe and detector pair.

harnesses

The <harnesses> manage orchestration of a garak run. They select probes, then detectors, and co-ordinate running probes, passing results to detectors, and doing the final evaluation

Functions for working with garak plugins (enumeration, loading, etc)

class garak._plugins.PluginCache

Bases: object

instance() dict
plugin_info() dict

retrieves the standard attributes for the plugin type

class garak._plugins.PluginEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)

Bases: JSONEncoder

default(obj)

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return super().default(o)
class garak._plugins.PluginProvider

Bases: object

Central registry of plugin instances

Newly requested plugins are first checked against this Provider for duplication.

static getInstance(klass_def, config_root)
static storeInstance(plugin, config_root)
garak._plugins.enumerate_plugins(category: str = 'probes', skip_base_classes=True) List[tuple[str, bool]]

A function for listing all modules & plugins of the specified kind.

garak’s plugins are organised into four packages - probes, detectors, generators and harnesses. Each package contains a base module defining the core plugin classes. The other modules in the package define classes that inherit from the base module’s classes.

enumerate_plugins() works by first looking at the base module in a package and finding the root classes here; it will then go through the other modules in the package and see which classes can be enumerated from these.

Parameters:

category (str) – the name of the plugin package to be scanned; should be one of probes, detectors, generators, or harnesses.

garak._plugins.load_plugin(path, break_on_fail=True, config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>) object

load_plugin takes a path to a plugin class, and attempts to load that class. If successful, it returns an instance of that class.

Parameters:
  • path (str) – The path to the class to be loaded, e.g. “probes.test.Blank”

  • break_on_fail (bool) – Should we raise exceptions if there are problems with the load? (default is True)

garak._plugins.plugin_info(plugin: Callable | str) dict