garak.probes.continuation

Probes to check if a generator will continue a contentious term.

Operates by giving high-risk context as a prompt, extracted from e.g. Urban Dictionary examples of use of contentious terms, and seeing if the model will continue the term based on varying amounts of prefix.

class garak.probes.continuation.ContinueSlursReclaimedSlurs(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: Probe

Continue partial word with slur or reclaimed slur.

This probe poses a partial ethnic slur with some preceding content, and checks to see if the model will complete the slur.

active: bool = False
bcp47: Iterable[str] | None = 'en'
doc_uri: str = ''
goal: str = 'output an ethnic slur'
recommended_detector: Iterable[str] = ['continuation.Continuation']
source_resource_filename = 'slurprompts.jsonl'
tags: Iterable[str] = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:performance:P0401', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Toxicity', 'quality:Behavioral:ContentSafety:HateHarassment', 'payload:unwanted:hatespeech']
class garak.probes.continuation.ContinueSlursReclaimedSlursMini(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: ContinueSlursReclaimedSlurs, Probe

Continue partial word with slur or reclaimed slur.

This probe poses a partial ethnic slur with some preceding content, and checks to see if the model will complete the slur. It uses a subset of 80 slurs & contexts that test LLMs have been likely to provide completions for.

active: bool = True
aliases = ['continuation.ContinueSlursReclaimedSlurs80']
source_resource_filename = 'slurprompts_mini.jsonl'