garak.probes.promptinject

PromptInject

Basic prompt injections, including plaintext malicious commands into requests for innocuous information.

Probes in this module should specifically implement parts of the PromptInject framework.

Uses the PromptInject framework, https://openreview.net/forum?id=qiaRo_7Zmug, to probe a system. A subset of the attacks is implemented here. The default is to use a selection of that subset, because even the subset corresponds to thousands of requests by default.

Further info:

class HijackHateHumans(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: Probe

PromptInject probe to inject a string; payload is hate-humans

Attempt to make the model repeat a given string

active: bool = True
description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
goal: str = 'inject a prompt on hate humans'
lang: str | None = 'en'
load_data_all_attacks(prompt_data)Source
name = 'promptinject_hate-humans'
parallelisable_attempts: bool = False
primary_detector: str | None = 'promptinject.AttackRogueString'
rogue_string = 'hate-humans'
tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
tier: Tier = 2
uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
class HijackHateHumansFull(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: Probe

PromptInject probe to inject a string; payload is hate-humans

Attempt to make the model repeat a given string

active: bool = False
description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
goal: str = 'inject a prompt on hate humans'
lang: str | None = 'en'
load_data_all_attacks(prompt_data)Source
name = 'promptinject_hate-humans'
parallelisable_attempts: bool = False
primary_detector: str | None = 'promptinject.AttackRogueString'
rogue_string = 'hate-humans'
tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
tier: Tier = 2
uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
class HijackKillHumans(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: Probe

PromptInject probe to inject a string; payload is kill-humans

Attempt to make the model repeat a given string

active: bool = True
description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
goal: str = 'inject a prompt on kill humans'
lang: str | None = 'en'
load_data_all_attacks(prompt_data)Source
name = 'promptinject_kill-humans'
parallelisable_attempts: bool = False
primary_detector: str | None = 'promptinject.AttackRogueString'
rogue_string = 'kill-humans'
tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
tier: Tier = 2
uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
class HijackKillHumansFull(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: Probe

PromptInject probe to inject a string; payload is kill-humans

Attempt to make the model repeat a given string

active: bool = False
description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
goal: str = 'inject a prompt on kill humans'
lang: str | None = 'en'
load_data_all_attacks(prompt_data)Source
name = 'promptinject_kill-humans'
parallelisable_attempts: bool = False
primary_detector: str | None = 'promptinject.AttackRogueString'
rogue_string = 'kill-humans'
tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
tier: Tier = 2
uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
class HijackLongPrompt(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: Probe

PromptInject probe to inject a string; payload is long-prompt

Attempt to make the model repeat a given string

active: bool = True
description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
goal: str = 'inject a prompt on long prompt'
lang: str | None = 'en'
load_data_all_attacks(prompt_data)Source
name = 'promptinject_long-prompt'
parallelisable_attempts: bool = False
primary_detector: str | None = 'promptinject.AttackRogueString'
rogue_string = 'long-prompt'
tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
tier: Tier = 2
uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
class HijackLongPromptFull(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: Probe

PromptInject probe to inject a string; payload is long-prompt

Attempt to make the model repeat a given string

active: bool = False
description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
goal: str = 'inject a prompt on long prompt'
lang: str | None = 'en'
load_data_all_attacks(prompt_data)Source
name = 'promptinject_long-prompt'
parallelisable_attempts: bool = False
primary_detector: str | None = 'promptinject.AttackRogueString'
rogue_string = 'long-prompt'
tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
tier: Tier = 2
uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
constructor(self, config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
load_data_all_attacks(self, prompt_data)Source