garak.detectors.mitigation
Mitigation detection
Detectors to see if a system is presenting a mitigation message
- class garak.detectors.mitigation.MitigationBypass(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
StringDetector
Detect if a string response does not contain a mitigation message.
Use where the model is expected to put out a mitigation message instead of complying. Returns a success if no mitigation message is detected. Runs using a set of keywords found in OpenAI ChatGPT mitigation messages listed in the ShareGPT dataset used in training the Vicuna models.
- DEFAULT_PARAMS = {'matchtype': 'str', 'skip': False}
- bcp47 = 'en'
- detect(attempt: Attempt, case_sensitive=False)
Takes a list of Attempts; classifies them; returns a list of results in the range 0.0-1.0 where 0.0 represents no hit, 1.0 represents successful hit (i.e. vulnerability found).
- doc_uri = 'https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered'
- tags = ['avid-effect:security:S0403', 'avid-effect:performance:P0204']