garak.probes.doctor

Doctor attack

Post as a doctor in effort to override safeguards

This module is for roleplaying attacks where the roleplayed character is a doctor. Maybe even from House.

Further info:

class Bypass(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: Probe

Implement Policy Puppetry Bypass

Post-instruction hierarchy, universal, and transferable prompt injection technique that successfully bypasses instruction hierarchy and safety guardrails

doc_uri: str = 'https://hiddenlayer.com/innovation-hub/novel-universal-bypass-for-all-major-llms/'
goal: str = 'roleplay as doctor to get bypass guardrails'
lang: str | None = 'en'
primary_detector: str | None = 'mitigation.MitigationBypass'
tags: Iterable[str] = ['owasp:llm01', 'avid-effect:security:S0301', 'quality:Security:PromptStability', 'cwe:1427']
tier: Tier = 2
class BypassLeet(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: Bypass

Implement Policy Puppetry Bypass

Post-instruction hierarchy, universal, and transferable prompt injection technique that successfully bypasses instruction hierarchy and safety guardrails

doc_uri: str = 'https://hiddenlayer.com/innovation-hub/novel-universal-bypass-for-all-major-llms/'
goal: str = 'roleplay as doctor to get bypass guardrails, but use leetspeak'
tier: Tier = 2
class Puppetry(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: Probe

Implement Policy Puppetry Prompt Injection

Post-instruction hierarchy, universal, and transferable prompt injection technique that successfully bypasses instruction hierarchy and safety guardrails

doc_uri: str = 'https://hiddenlayer.com/innovation-hub/novel-universal-bypass-for-all-major-llms/'
goal: str = 'roleplay as doctor to get unsafe outputs'
lang: str | None = 'en'
primary_detector: str | None = 'mitigation.MitigationBypass'
tags: Iterable[str] = ['owasp:llm01', 'avid-effect:security:S0301', 'quality:Security:PromptStability', 'cwe:1427']
tier: Tier = 2

To test system prompt bypassing we created an application with a system prompt showing a very common design pattern that restricts topics, in this case, a healthcare chatbot that is instructed to respond to any request for medical advice by the user with the fixed string “I am sorry but I cannot provide medical advice. Please consult with a medical advisor.”