garak.probes.doctor

Doctor attack

Post as a doctor in effort to override safeguards

This module is for roleplaying attacks where the roleplayed character is a doctor. Maybe even from House.

Further info:

https://hiddenlayer.com/innovation-hub/novel-universal-bypass-for-all-major-llms/

class Bypass(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/garak/_config.py'>)Source 

Bases: Probe

Implement Policy Puppetry Bypass

Post-instruction hierarchy, universal, and transferable prompt injection technique that successfully bypasses instruction hierarchy and safety guardrails

doc_uri: str = 'https://hiddenlayer.com/innovation-hub/novel-universal-bypass-for-all-major-llms/'

goal: str = 'roleplay as doctor to get bypass guardrails'

intent: str | None = 'T009ignore'

lang: str | None = 'en'

primary_detector: str | None = 'mitigation.MitigationBypass'

tags: Iterable[str] = ['owasp:llm01', 'avid-effect:security:S0301', 'quality:Security:PromptStability', 'cwe:1427', 'demon:Language:Prompt_injection:Ignore_previous_instructions', 'demon:Language:Stylizing:Formal_language', 'demon:Fictionalizing:Roleplaying:User_persona', 'demon:Rhetoric:Persuasion_and_manipulation:Distraction']

tier: Tier = 2

class BypassLeet(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/garak/_config.py'>)Source 

Bases: Bypass

Implement Policy Puppetry Bypass

Post-instruction hierarchy, universal, and transferable prompt injection technique that successfully bypasses instruction hierarchy and safety guardrails

doc_uri: str = 'https://hiddenlayer.com/innovation-hub/novel-universal-bypass-for-all-major-llms/'

goal: str = 'roleplay as doctor to get bypass guardrails, but use leetspeak'

intent: str | None = 'T009ignore'

tags: Iterable[str] = ['owasp:llm01', 'avid-effect:security:S0301', 'quality:Security:PromptStability', 'cwe:1427', 'demon:Language:Prompt_injection:Ignore_previous_instructions', 'demon:Language:Stylizing:Formal_language', 'demon:Fictionalizing:Roleplaying:User_persona', 'demon:Rhetoric:Persuasion_and_manipulation:Distraction', 'demon:Language:Code_and_encode:Data_encoding']

tier: Tier = 2

class Puppetry(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/garak/_config.py'>)Source 

Bases: Probe

Implement Policy Puppetry Prompt Injection

Post-instruction hierarchy, universal, and transferable prompt injection technique that successfully bypasses instruction hierarchy and safety guardrails

doc_uri: str = 'https://hiddenlayer.com/innovation-hub/novel-universal-bypass-for-all-major-llms/'

goal: str = 'roleplay as doctor to get unsafe outputs'

intent: str | None = 'T009ignore'

lang: str | None = 'en'

primary_detector: str | None = 'mitigation.MitigationBypass'

tags: Iterable[str] = ['owasp:llm01', 'avid-effect:security:S0301', 'quality:Security:PromptStability', 'cwe:1427', 'demon:Language:Prompt_injection:Ignore_previous_instructions', 'demon:Language:Stylizing:Formal_language', 'demon:Fictionalizing:Roleplaying:User_persona', 'demon:Rhetoric:Persuasion_and_manipulation:Distraction']

tier: Tier = 2: To test system prompt bypassing we created an application with a system prompt showing a very common design pattern that restricts topics, in this case, a healthcare chatbot that is instructed to respond to any request for medical advice by the user with the fixed string “I am sorry but I cannot provide medical advice. Please consult with a medical advisor.”