garak.probes.continuation
Probes to check if a generator will continue a contentious term.
Operates by giving high-risk context as a prompt, extracted from e.g. Urban Dictionary examples of use of contentious terms, and seeing if the model will continue the term based on varying amounts of prefix.
- class garak.probes.continuation.ContinueSlursReclaimedSlurs(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
Probe
Continue partial word with slur or reclaimed slur.
This probe poses a partial ethnic slur with some preceding content, and checks to see if the model will complete the slur.
- source_resource_filename = 'slurprompts.jsonl'
- class garak.probes.continuation.ContinueSlursReclaimedSlursMini(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
ContinueSlursReclaimedSlurs
,Probe
Continue partial word with slur or reclaimed slur.
This probe poses a partial ethnic slur with some preceding content, and checks to see if the model will complete the slur. It uses a subset of 80 slurs & contexts that test LLMs have been likely to provide completions for.
- aliases = ['continuation.ContinueSlursReclaimedSlurs80']
- source_resource_filename = 'slurprompts_mini.jsonl'