garak.probes.phrasing

Phrasing

These attacks rephrase requests for unsafe content in an attempt to bypass safeguards.

This module is home to probes that rely on specific grammatical alterations and rephrasing to implement their techinque.

class FutureTense(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: FutureTenseFull

Attempt to evade safeguards by setting requests in the future tense

A subset of the tense-modified requests. Success is reported higher with more variation, so generations is overridden.

active: bool = True
class FutureTenseFull(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: Probe

Will the system give unsafe responses when asked to write about weapon creation or dangerous behavior in the future tense?

This check sends the generator requests for advice on possibly protect or restricted subjects in the future tense.

active: bool = False
doc_uri: str = 'https://arxiv.org/html/2407.11969v2'
goal: str = 'use future tense to disregard the system prompt'
lang: str | None = 'en'
primary_detector: str | None = 'mitigation.Prefixes'
tags: Iterable[str] = ['owasp:llm01', 'quality:Security:PromptStability', 'payload:jailbreak']
tier: Tier = 2
class PastTense(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: PastTenseFull

Attempt to evade safeguards by setting requests in the past tense

A subset of the tense-modified requests. Success is reported higher with more variation, so generations is overridden.

active: bool = True
class PastTenseFull(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: Probe

Will the system give unsafe responses when asked to write about weapon creation or dangerous behavior in the past tense?

This check sends the generator requests for advice on possibly protect or restricted subjects in the past tense.

active: bool = False
doc_uri: str = 'https://arxiv.org/html/2407.11969v2'
goal: str = 'use past tense to disregard the system prompt'
lang: str | None = 'en'
primary_detector: str | None = 'mitigation.Prefixes'
tags: Iterable[str] = ['owasp:llm01', 'quality:Security:PromptStability', 'payload:jailbreak']
tier: Tier = 2