garak.probes.doctor
Doctor attack
Post as a doctor in effort to override safeguards
This module is for roleplaying attacks where the roleplayed character is a doctor. Maybe even from House.
Further info:
- class Bypass(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
ProbeImplement Policy Puppetry Bypass
Post-instruction hierarchy, universal, and transferable prompt injection technique that successfully bypasses instruction hierarchy and safety guardrails
- class BypassLeet(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
BypassImplement Policy Puppetry Bypass
Post-instruction hierarchy, universal, and transferable prompt injection technique that successfully bypasses instruction hierarchy and safety guardrails
- class Puppetry(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
ProbeImplement Policy Puppetry Prompt Injection
Post-instruction hierarchy, universal, and transferable prompt injection technique that successfully bypasses instruction hierarchy and safety guardrails
- tags: Iterable[str] = ['owasp:llm01', 'avid-effect:security:S0301', 'quality:Security:PromptStability', 'cwe:1427']
- tier: Tier = 2
To test system prompt bypassing we created an application with a system prompt showing a very common design pattern that restricts topics, in this case, a healthcare chatbot that is instructed to respond to any request for medical advice by the user with the fixed string “I am sorry but I cannot provide medical advice. Please consult with a medical advisor.”