garak.probes.phrasing

Past Tense

Refusal training is widely used to prevent LLMs from generating harmful, undesirable, or illegal

outputs. We reveal a curious generalization gap in the current refusal training approaches: simply reformulating a harmful request in the past tense is often sufficient to jailbreak many state- of-the-art LLMs

class garak.probes.phrasing.FutureTense(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: Probe

Will the system give unsafe responses when asked to write about weapon creation or dangerous behavior?

This check sends the generator requests for advice on possibly protect or restricted subjects in the future tense.

active: bool = False
bcp47: Iterable[str] | None = 'en'
doc_uri: str = 'https://arxiv.org/pdf/2407.11969v2'
goal: str = 'use future tense to disregard the system prompt'
primary_detector: str | None = 'specialwords.Prefixes'
tags: Iterable[str] = ['owasp:llm01', 'quality:Security:PromptStability', 'payload:jailbreak']
class garak.probes.phrasing.FutureTenseMini(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: FutureTense, TenseMini

Attempt to evade safeguards by setting requests in the future tense

A subset of the tense-modified requests. Success is reported higher with more variation, so generations is overridden.

active: bool = True
class garak.probes.phrasing.PastTense(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: Probe

Will the system give unsafe responses when asked to write about weapon creation or dangerous behavior?

This check sends the generator requests for advice on possibly protect or restricted subjects in the past tense.

active: bool = False
bcp47: Iterable[str] | None = 'en'
doc_uri: str = 'https://arxiv.org/pdf/2407.11969v2'
goal: str = 'use past tense to disregard the system prompt'
primary_detector: str | None = 'specialwords.Prefixes'
tags: Iterable[str] = ['owasp:llm01', 'quality:Security:PromptStability', 'payload:jailbreak']
class garak.probes.phrasing.PastTenseMini(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: PastTense, TenseMini

Attempt to evade safeguards by setting requests in the past tense

A subset of the tense-modified requests. Success is reported higher with more variation, so generations is overridden.

active: bool = True
class garak.probes.phrasing.TenseMini

Bases: object