garak.probes.donotanswer
Do-Not-Answer Probes
Present the target with a request for contentious or unwanted content.
Probes in this module should be from the “Do Not Answer” paper.
Do-Not-Answer is an open-source dataset to evaluate LLMs’ safety mechanism at a low cost. The dataset consists of prompts to which language models are often trained not to answer.
The DNA tests are somewhat opinionated and don’t represent universal norms (nothing does), so use with care.
Further info: * ttps://github.com/Libr-AI/do-not-answer * https://arxiv.org/abs/2308.13387
- class DiscriminationExclusionToxicityHatefulOffensive(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
ProbeDo Not Answer probe
A curated list of questions to which LLMs should not respond back, specifically testing discrimination exclusion toxicity hateful offensive
- description = 'Implementation of discrimination_exclusion_toxicity_hateful_offensive probes in DoNotAnswer_Prompts'
- dna_category = 'discrimination_exclusion_toxicity_hateful_offensive'
- prompts = []
- tags: Iterable[str] = ['avid-effect:ethics:E0301', 'quality:Behavioral:ContentSafety:Toxicity', 'quality:Behavioral:ContentSafety:HateHarassment', 'quality:Behavioral:ContentSafety:Violence', 'quality:Behavioral:ContentSafety:Profanity']
- uri = 'https://arxiv.org/abs/2308.13387'
- class HumanChatbox(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
ProbeDo Not Answer probe
A curated list of questions to which LLMs should not respond back, specifically testing human chatbox
- description = 'Implementation of human_chatbox probes in DoNotAnswer_Prompts'
- dna_category = 'human_chatbox'
- prompts = []
- tags: Iterable[str] = ['avid-effect:security:S0301', 'quality:Behavioral:ContentSafety:Unauthorized']
- uri = 'https://arxiv.org/abs/2308.13387'
- class InformationHazard(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
ProbeDo Not Answer probe
A curated list of questions to which LLMs should not respond back, specifically testing information hazard
- description = 'Implementation of information_hazard probes in DoNotAnswer_Prompts'
- dna_category = 'information_hazard'
- prompts = []
- tags: Iterable[str] = ['owasp:llm06', 'avid-effect:security:S0301', 'quality:Security:Confidentiality']
- uri = 'https://arxiv.org/abs/2308.13387'
- class MaliciousUses(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
ProbeDo Not Answer probe
A curated list of questions to which LLMs should not respond back, specifically testing malicious uses
- description = 'Implementation of malicious_uses probes in DoNotAnswer_Prompts'
- dna_category = 'malicious_uses'
- prompts = []
- uri = 'https://arxiv.org/abs/2308.13387'
- class MisinformationHarms(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
ProbeDo Not Answer probe
A curated list of questions to which LLMs should not respond back, specifically testing misinformation harms
- description = 'Implementation of misinformation_harms probes in DoNotAnswer_Prompts'
- dna_category = 'misinformation_harms'
- prompts = []
- tags: Iterable[str] = ['quality:Robustness:GenerativeMisinformation', 'avid-effect:ethics:E0402', 'owasp:llm09']
- uri = 'https://arxiv.org/abs/2308.13387'