garak.detectors.base

This class defines the basic structure of garak’s detectors. All detectors inherit from garak.detectors.base.Detector.

Attributes

1. doc_uri URI for documentation of the detector (perhaps a paper) 1. lang_spec Language this is for. format: a comma-separated list of BCP47 tags, or “*” for any or not applicable. Content returned by a target can be in more than one language; single detectors can be capable of processing input in more than just one language. This field tracks which ones are supported. NB this is different from probe, which is monolingual and uses lang. 1. active Should this detector be used by default? 1. tags MISP-format taxonomy categories 1. modality Which modalities does this detector work on? garak supports mainstream any-to-any large models, but only assesses text output.

Base detectors.

These detectors show expectations and operations of any detector implemented in garak. garak detectors must inherit from a class in here.

class Detector(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: Configurable

Base class for objects that define a way of detecting a probe hit / LLM failure

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

active: bool = True
detect(attempt: Attempt) Iterable[float | None]Source

Takes a list of Attempts; classifies them; returns a list of results in the range 0.0-1.0 where 0.0 represents no hit, 1.0 represents successful hit (i.e. vulnerability found).

doc_uri: str = None
extra_dependency_names = []
hit_desc = 'Detector identified matching indicators of risk'
lang_spec: str = None
modality: dict = {'out': {'text'}}
pass_desc = 'Detector found no issue'
tags = []
class FileDetector(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: Detector

Detector subclass for processing attempts whose outputs are filenames for checking

detect(attempt: Attempt) Iterable[float | None]Source

Takes a list of Attempts; classifies them; returns a list of results in the range 0.0-1.0 where 0.0 represents no hit, 1.0 represents successful hit (i.e. vulnerability found).

valid_format = 'local filename'
class HFDetector(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: Detector, HFCompatible

Detector using a Hugging Face model

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

  • hf_args = {'device': 'cpu'}

  • tokenizer_kwargs = {'padding': True, 'truncation': True}

  • graceful_fail = True

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

detect(attempt: Attempt) List[float | None]Source

Takes a list of Attempts; classifies them; returns a list of results in the range 0.0-1.0 where 0.0 represents no hit, 1.0 represents successful hit (i.e. vulnerability found).

class StringDetector(substrings, config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: Detector

Subclass of Detector using list of substrings as detection triggers

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

  • matchtype = 'str'

  • case_sensitive = False

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

detect(attempt: Attempt) Iterable[float | None]Source

Takes a list of Attempts; classifies them; returns a list of results in the range 0.0-1.0 where 0.0 represents no hit, 1.0 represents successful hit (i.e. vulnerability found).

class TriggerListDetector(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: Detector

Subclass of Detector using attempt’s notes[triggers] as list of strings to match

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

  • case_sensitive = False

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

detect(attempt: Attempt) Iterable[float | None]Source

Takes a list of Attempts; classifies them; returns a list of results in the range 0.0-1.0 where 0.0 represents no hit, 1.0 represents successful hit (i.e. vulnerability found).