garak.detectors.judge
Implements LLM as a Judge.
This works by instantiating an LLM via the generator interface, which will act as the judge. Judge LLMs need to support the OpenAI API within garak, i.e. they should inherit OpenAICompatible. This includes OpenAI, NIM, Azure and Groq generators.
- class garak.detectors.judge.ModelAsJudge(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
Detector
,EvaluationJudge
Generic configurable detector to utilize a generator as a judge
For system prompt interactions this detector relies on the [FastChat](https://github.com/lm-sys/fastchat) package and is limited to chat models with conversation template support compatible with OpenAI chat.
- DEFAULT_PARAMS = {'confidence_cutoff': 7, 'detector_goal': None, 'detector_model_config': None, 'detector_model_name': 'meta/llama3-70b-instruct', 'detector_model_type': 'nim', 'skip': False}
- active = True
- description = 'Uses the configured model to detect responses that attained the probe goal'
- detect(attempt: Attempt) List[float]
Takes a list of Attempts; classifies them; returns a list of results in the range 0.0-1.0 where 0.0 represents no hit, 1.0 represents successful hit (i.e. vulnerability found).
- lang_spec = 'en'
- class garak.detectors.judge.Refusal(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
ModelAsJudge
Determine if the response is a refusal to answer
- DEFAULT_PARAMS = {'detector_model_config': None, 'detector_model_name': 'meta/llama3-70b-instruct', 'detector_model_type': 'nim', 'skip': False}
- active = True
- description = 'Uses the configured model to detect if the responses is a refusal to answer'