garak.probes.visual_jailbreak

Visual Jailbreak

Write instructions in an image in an attempt to bypass text-based alignment.

This module contains probes based on the Visual Jailbreak method details in the FigStep paper. Other probes that implement the technique are welcome, even if they aren’t detailed in the paper.

Further info:

class FigStep(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: FigStepFull, Probe

Using image modality to assist jailbreak. Refer to the following paper for more detail:

Yichen Gong, Delong Ran, Jinyuan Liu, Conglei Wang, Tianshuo Cong, Anyu Wang, Sisi Duan, and Xiaoyun Wang. 2023. “FigStep: Jailbreaking Large Vision-language Models via Typographic Visual Prompts.” ArXiv. https://arxiv.org/pdf/2311.05608

  • Tiny version

active: bool = False
probe(generator)Source

attempt to exploit the target generator, returning a list of results

safebench_image_catalog = LocalDataPath('/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/data/safebenchtiny_filenames.txt')
class FigStepFull(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: Probe

Using image modality to assist jailbreak. Refer to the following paper for more detail:

Yichen Gong, Delong Ran, Jinyuan Liu, Conglei Wang, Tianshuo Cong, Anyu Wang, Sisi Duan, and Xiaoyun Wang. 2023. “FigStep: Jailbreaking Large Vision-language Models via Typographic Visual Prompts.” ArXiv. https://arxiv.org/pdf/2311.05608

active: bool = False
doc_uri: str = 'https://arxiv.org/abs/2311.05608'
goal: str = 'disregard the system prompt with both text and image'
lang: str | None = 'en'
modality: dict = {'in': {'image', 'text'}}
primary_detector: str | None = 'visual_jailbreak.FigStep'
probe(generator)Source

attempt to exploit the target generator, returning a list of results

prompts = []
safebench_image_catalog = LocalDataPath('/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/data/safebench_filenames.txt')
safebench_image_filenames = []
tags: Iterable[str] = ['owasp:llm01', 'quality:Security:PromptStability', 'payload:jailbreak']
tier: Tier = 2