garak.probes.donotanswer
Do-Not-Answer Probes
https://github.com/Libr-AI/do-not-answer https://arxiv.org/abs/2308.13387
An open-source dataset to evaluate LLMs’ safety mechanism at a low cost. The dataset consists of prompts to which language models are often trained not to answer.
- class garak.probes.donotanswer.DiscriminationExclusionToxicityHatefulOffensive(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
Probe
Do Not Answer probe
A curated list of questions to which LLMs should not respond back, specifically testing discrimination exclusion toxicity hateful offensive
- description = 'Implementation of discrimination_exclusion_toxicity_hateful_offensive probes in DoNotAnswer_Prompts'
- dna_category = 'discrimination_exclusion_toxicity_hateful_offensive'
- load_data()
- prompts = []
- tags: Iterable[str] = ['avid-effect:ethics:E0301', 'quality:Behavioral:ContentSafety:Toxicity', 'quality:Behavioral:ContentSafety:HateHarassment', 'quality:Behavioral:ContentSafety:Violence', 'quality:Behavioral:ContentSafety:Profanity']
- uri = 'https://arxiv.org/abs/2308.13387'
- class garak.probes.donotanswer.HumanChatbox(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
Probe
Do Not Answer probe
A curated list of questions to which LLMs should not respond back, specifically testing human chatbox
- description = 'Implementation of human_chatbox probes in DoNotAnswer_Prompts'
- dna_category = 'human_chatbox'
- load_data()
- prompts = []
- tags: Iterable[str] = ['avid-effect:security:S0301', 'quality:Behavioral:ContentSafety:Unauthorized']
- uri = 'https://arxiv.org/abs/2308.13387'
- class garak.probes.donotanswer.InformationHazard(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
Probe
Do Not Answer probe
A curated list of questions to which LLMs should not respond back, specifically testing information hazard
- description = 'Implementation of information_hazard probes in DoNotAnswer_Prompts'
- dna_category = 'information_hazard'
- load_data()
- prompts = []
- tags: Iterable[str] = ['owasp:llm06', 'avid-effect:security:S0301', 'quality:Security:Confidentiality']
- uri = 'https://arxiv.org/abs/2308.13387'
- class garak.probes.donotanswer.MaliciousUses(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
Probe
Do Not Answer probe
A curated list of questions to which LLMs should not respond back, specifically testing malicious uses
- description = 'Implementation of malicious_uses probes in DoNotAnswer_Prompts'
- dna_category = 'malicious_uses'
- load_data()
- prompts = []
- uri = 'https://arxiv.org/abs/2308.13387'
- class garak.probes.donotanswer.MisinformationHarms(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
Probe
Do Not Answer probe
A curated list of questions to which LLMs should not respond back, specifically testing misinformation harms
- description = 'Implementation of misinformation_harms probes in DoNotAnswer_Prompts'
- dna_category = 'misinformation_harms'
- load_data()
- prompts = []
- tags: Iterable[str] = ['quality:Robustness:GenerativeMisinformation', 'avid-effect:ethics:E0402', 'owasp:llm09']
- uri = 'https://arxiv.org/abs/2308.13387'
- garak.probes.donotanswer.load_local_data(self)
- garak.probes.donotanswer.local_constructor(self, config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)