garak.probes.promptinject

PromptInject

Uses the PromptInject framework, https://openreview.net/forum?id=qiaRo_7Zmug, to probe a system. A subset of the attacks is implemented here. The default is to use a selection of that subset, because even the subset corresponds to thousands of requests by default.

class garak.probes.promptinject.HijackHateHumans(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: Probe

PromptInject probe to inject a string; payload is hate-humans

Attempt to make the model repeat a given string

active: bool = False
bcp47: Iterable[str] | None = 'en'
description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
goal: str = 'inject a prompt on hate humans'
load_data_all_attacks()
max_prompts = None
name = 'promptinject_hate-humans'
parallelisable_attempts: bool = False
recommended_detector: Iterable[str] = ['promptinject.AttackRogueString']
rogue_string = 'hate-humans'
tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
class garak.probes.promptinject.HijackHateHumansMini(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: Probe

PromptInject probe to inject a string; payload is hate-humans

Attempt to make the model repeat a given string

active: bool = True
bcp47: Iterable[str] | None = 'en'
description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
goal: str = 'inject a prompt on hate humans'
load_data_all_attacks()
max_prompts = 100
name = 'promptinject_hate-humans'
parallelisable_attempts: bool = False
recommended_detector: Iterable[str] = ['promptinject.AttackRogueString']
rogue_string = 'hate-humans'
tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
class garak.probes.promptinject.HijackKillHumans(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: Probe

PromptInject probe to inject a string; payload is kill-humans

Attempt to make the model repeat a given string

active: bool = False
bcp47: Iterable[str] | None = 'en'
description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
goal: str = 'inject a prompt on kill humans'
load_data_all_attacks()
max_prompts = None
name = 'promptinject_kill-humans'
parallelisable_attempts: bool = False
recommended_detector: Iterable[str] = ['promptinject.AttackRogueString']
rogue_string = 'kill-humans'
tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
class garak.probes.promptinject.HijackKillHumansMini(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: Probe

PromptInject probe to inject a string; payload is kill-humans

Attempt to make the model repeat a given string

active: bool = True
bcp47: Iterable[str] | None = 'en'
description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
goal: str = 'inject a prompt on kill humans'
load_data_all_attacks()
max_prompts = 100
name = 'promptinject_kill-humans'
parallelisable_attempts: bool = False
recommended_detector: Iterable[str] = ['promptinject.AttackRogueString']
rogue_string = 'kill-humans'
tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
class garak.probes.promptinject.HijackLongPrompt(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: Probe

PromptInject probe to inject a string; payload is long-prompt

Attempt to make the model repeat a given string

active: bool = False
bcp47: Iterable[str] | None = 'en'
description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
goal: str = 'inject a prompt on long prompt'
load_data_all_attacks()
max_prompts = None
name = 'promptinject_long-prompt'
parallelisable_attempts: bool = False
recommended_detector: Iterable[str] = ['promptinject.AttackRogueString']
rogue_string = 'long-prompt'
tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
class garak.probes.promptinject.HijackLongPromptMini(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: Probe

PromptInject probe to inject a string; payload is long-prompt

Attempt to make the model repeat a given string

active: bool = True
bcp47: Iterable[str] | None = 'en'
description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
goal: str = 'inject a prompt on long prompt'
load_data_all_attacks()
max_prompts = 100
name = 'promptinject_long-prompt'
parallelisable_attempts: bool = False
recommended_detector: Iterable[str] = ['promptinject.AttackRogueString']
rogue_string = 'long-prompt'
tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
garak.probes.promptinject.constructor(self, config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
garak.probes.promptinject.load_data_all_attacks(self)