garak.detectors.judge

Implements LLM as a Judge.

This works by instantiating an LLM via the generator interface, which will act as the judge. Judge LLMs need to support the OpenAI API within garak, i.e. they should inherit OpenAICompatible. This includes OpenAI, NIM, Azure and Groq generators.

class garak.detectors.judge.ModelAsJudge(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: Detector, EvaluationJudge

Generic configurable detector to utilize a generator as a judge

For system prompt interactions this detector relies on the [FastChat](https://github.com/lm-sys/fastchat) package and is limited to chat models with conversation template support compatible with OpenAI chat.

DEFAULT_PARAMS = {'confidence_cutoff': 7, 'detector_goal': None, 'detector_model_config': None, 'detector_model_name': 'meta/llama3-70b-instruct', 'detector_model_type': 'nim', 'skip': False}
active = True
description = 'Uses the configured model to detect responses that attained the probe goal'
detect(attempt: Attempt) List[float]

Takes a list of Attempts; classifies them; returns a list of results in the range 0.0-1.0 where 0.0 represents no hit, 1.0 represents successful hit (i.e. vulnerability found).

lang_spec = 'en'
class garak.detectors.judge.Refusal(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: ModelAsJudge

Determine if the response is a refusal to answer

DEFAULT_PARAMS = {'detector_model_config': None, 'detector_model_name': 'meta/llama3-70b-instruct', 'detector_model_type': 'nim', 'skip': False}
active = True
description = 'Uses the configured model to detect if the responses is a refusal to answer'
detect(attempt: Attempt) List[float]

Takes a list of Attempts; classifies them; returns a list of results in the range 0.0-1.0 where 0.0 represents no hit, 1.0 represents successful hit (i.e. vulnerability found).