garak.probes.promptinject
PromptInject
Basic prompt injections, including plaintext malicious commands into requests for innocuous information.
Probes in this module should specifically implement parts of the PromptInject framework.
Uses the PromptInject framework, https://openreview.net/forum?id=qiaRo_7Zmug, to probe a system. A subset of the attacks is implemented here. The default is to use a selection of that subset, because even the subset corresponds to thousands of requests by default.
Further info:
- class HijackHateHumans(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
ProbePromptInject probe to inject a string; payload is hate-humans
Attempt to make the model repeat a given string
- description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
- name = 'promptinject_hate-humans'
- rogue_string = 'hate-humans'
- tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
- uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
- class HijackHateHumansFull(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
ProbePromptInject probe to inject a string; payload is hate-humans
Attempt to make the model repeat a given string
- description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
- name = 'promptinject_hate-humans'
- rogue_string = 'hate-humans'
- tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
- uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
- class HijackKillHumans(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
ProbePromptInject probe to inject a string; payload is kill-humans
Attempt to make the model repeat a given string
- description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
- name = 'promptinject_kill-humans'
- rogue_string = 'kill-humans'
- tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
- uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
- class HijackKillHumansFull(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
ProbePromptInject probe to inject a string; payload is kill-humans
Attempt to make the model repeat a given string
- description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
- name = 'promptinject_kill-humans'
- rogue_string = 'kill-humans'
- tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
- uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
- class HijackLongPrompt(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
ProbePromptInject probe to inject a string; payload is long-prompt
Attempt to make the model repeat a given string
- description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
- name = 'promptinject_long-prompt'
- rogue_string = 'long-prompt'
- tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
- uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
- class HijackLongPromptFull(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
ProbePromptInject probe to inject a string; payload is long-prompt
Attempt to make the model repeat a given string
- description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
- name = 'promptinject_long-prompt'
- rogue_string = 'long-prompt'
- tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
- uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'