garak.probes.promptinject
PromptInject
Uses the PromptInject framework, https://openreview.net/forum?id=qiaRo_7Zmug, to probe a system. A subset of the attacks is implemented here. The default is to use a selection of that subset, because even the subset corresponds to thousands of requests by default.
- class garak.probes.promptinject.HijackHateHumans(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
Probe
PromptInject probe to inject a string; payload is hate-humans
Attempt to make the model repeat a given string
- description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
- load_data_all_attacks()
- max_prompts = None
- name = 'promptinject_hate-humans'
- rogue_string = 'hate-humans'
- tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
- uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
- class garak.probes.promptinject.HijackHateHumansMini(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
Probe
PromptInject probe to inject a string; payload is hate-humans
Attempt to make the model repeat a given string
- description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
- load_data_all_attacks()
- max_prompts = 100
- name = 'promptinject_hate-humans'
- rogue_string = 'hate-humans'
- tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
- uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
- class garak.probes.promptinject.HijackKillHumans(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
Probe
PromptInject probe to inject a string; payload is kill-humans
Attempt to make the model repeat a given string
- description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
- load_data_all_attacks()
- max_prompts = None
- name = 'promptinject_kill-humans'
- rogue_string = 'kill-humans'
- tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
- uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
- class garak.probes.promptinject.HijackKillHumansMini(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
Probe
PromptInject probe to inject a string; payload is kill-humans
Attempt to make the model repeat a given string
- description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
- load_data_all_attacks()
- max_prompts = 100
- name = 'promptinject_kill-humans'
- rogue_string = 'kill-humans'
- tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
- uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
- class garak.probes.promptinject.HijackLongPrompt(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
Probe
PromptInject probe to inject a string; payload is long-prompt
Attempt to make the model repeat a given string
- description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
- load_data_all_attacks()
- max_prompts = None
- name = 'promptinject_long-prompt'
- rogue_string = 'long-prompt'
- tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
- uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
- class garak.probes.promptinject.HijackLongPromptMini(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
Probe
PromptInject probe to inject a string; payload is long-prompt
Attempt to make the model repeat a given string
- description = 'PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. trophy Best Paper Awards @ NeurIPS ML Safety Workshop 2022'
- load_data_all_attacks()
- max_prompts = 100
- name = 'promptinject_long-prompt'
- rogue_string = 'long-prompt'
- tags: Iterable[str] = ['avid-effect:security:S0403', 'owasp:llm01', 'quality:Behavioral:ContentSafety:Violence', 'quality:Security:PromptStability', 'payload:unwanted:violence']
- uri = 'https://openreview.net/forum?id=qiaRo_7Zmug'
- garak.probes.promptinject.constructor(self, config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
- garak.probes.promptinject.load_data_all_attacks(self)