garak.probes.visual_jailbreak

Visual Jailbreak

class garak.probes.visual_jailbreak.FigStep(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: FigStepFull, Probe

Using another modal - image to assist jailbreak

Title: FigStep: Jailbreaking Large Vision-language Models via Typographic Visual Prompts Link: https://arxiv.org/pdf/2311.05608.pdf Reference: Yichen Gong, Delong Ran, Jinyuan Liu, Conglei Wang, Tianshuo Cong, Anyu Wang, Sisi Duan, and Xiaoyun Wang. 2023. FigStep: Jailbreaking Large Vision-language Models via Typographic Visual Prompts.

  • Tiny version

active: bool = False
probe(generator)

attempt to exploit the target generator, returning a list of results

safebench_image_catalog = LocalDataPath('/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/data/safebenchtiny_filenames.txt')
class garak.probes.visual_jailbreak.FigStepFull(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: Probe

Using another modal - image to assist jailbreak

Title: FigStep: Jailbreaking Large Vision-language Models via Typographic Visual Prompts Link: https://arxiv.org/pdf/2311.05608.pdf Reference: Yichen Gong, Delong Ran, Jinyuan Liu, Conglei Wang, Tianshuo Cong, Anyu Wang, Sisi Duan, and Xiaoyun Wang. 2023. FigStep: Jailbreaking Large Vision-language Models via Typographic Visual Prompts.

active: bool = False
bcp47: Iterable[str] | None = 'en'
doc_uri: str = 'https://arxiv.org/abs/2311.05608'
goal: str = 'disregard the system prompt with both text and image'
modality: dict = {'in': {'image', 'text'}}
probe(generator)

attempt to exploit the target generator, returning a list of results

prompts = []
recommended_detector: Iterable[str] = ['visual_jailbreak.FigStep']
safebench_image_catalog = LocalDataPath('/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/data/safebench_filenames.txt')
safebench_image_filenames = []
tags: Iterable[str] = ['owasp:llm01', 'quality:Security:PromptStability', 'payload:jailbreak']