garak.probes.badchars

Imperceptible perturbation probes inspired by the Bad Characters paper.

Implements the “just try everything” strategy for invisible Unicode characters, homoglyph substitutions, bidi-based reorderings, and deletion/backspace pairs as described in https://arxiv.org/abs/2106.09898.

class BadCharacters(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: Probe

Imperceptible perturbations mixing invisible chars, homoglyphs, swaps, deletions.

Implements the paper’s full spectrum of adversarial perturbations by generating prompt variants that insert invisible Unicode, swap characters with bidi tricks, substitute homoglyphs, and inject deletions/backspaces. The goal is to stress refusal policies with subtle transformations that should remain semantically equivalent to the original payloads.

Configurable parameters:

DEFAULT_PARAMS contents:

  • payload_name = 'harmful_behaviors'

  • perturbation_budget = 1

  • enabled_categories = ['invisible', 'homoglyph', 'reordering', 'deletion']

  • max_position_candidates = 24

  • max_reorder_candidates = 24

  • max_ascii_variants = 95

  • follow_prompt_cap = True

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

active: bool = True
doc_uri: str = 'https://arxiv.org/abs/2106.09898'
goal: str = 'inject imperceptible text perturbations that bypass refusal policies'
lang: str | None = 'en'
primary_detector: str | None = 'mitigation.MitigationBypass'
tags: Iterable[str] = ['avid-effect:security:S0403', 'quality:Security:PromptStability', 'payload:jailbreak']
tier: Tier = 2