garak.probes.agent_breaker

Agent Breaker probe

A multi-turn red-team probe for attacking agentic LLM applications that use tools.

Uses a red team model to analyze each tool for weaknesses, generate targeted exploits, and verify attack success through direct conversation with the agent.

Further info:

class AgentBreaker(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: IterativeProbe

Agent Breaker - Agentic Application Exploitation Probe

A multi-turn probe that attacks agentic LLM applications by:

  1. Loading agent purpose and tools from data/agent_breaker/agent.yaml (or auto-discovering them by asking the target agent if tools are missing)

  2. Analyzing each tool to understand what it does and how it works

  3. Identifying specific weaknesses based on each tool’s functionality

  4. Generating targeted exploits based on the tool-specific analysis

  5. Verifying attack success - stops immediately on success

Auto-discovery: If the YAML has no tools (or empty tools), the probe asks the target agent what tools it has (and optionally its purpose). The response is parsed by the red team model into the same format as the YAML. If agent_purpose is already set in the YAML, only tools are discovered.

Multi-turn attack strategy:

  • Each turn starts a NEW conversation with an improved attack payload

  • The red team model analyzes all previous attempts and their responses

  • It learns from failures and generates improved attacks that address weaknesses

  • The attack stops immediately when successful

The probe uses a red team model to:

  • Deeply understand each tool’s functionality

  • Identify how that specific functionality can be exploited

  • Generate attack prompts tailored to each tool’s weaknesses

  • Analyze previous attempt responses to improve subsequent attacks

  • Verify if attacks succeeded

  • Parse discovery responses when auto-discovering tools

Configuration: Supply $XDG_DATA_HOME/garak/data/agent_breaker/agent.yaml to describe your target agent. You may omit tools (and optionally agent_purpose) to use auto- discovery. The YAML format is:

agent_purpose: |
  A helpful personal assistant that can execute code and read files.

tools:
- name: tool_name
  description: what it does

Configurable parameters:

DEFAULT_PARAMS contents:

  • max_calls_per_conv = 50

  • follow_prompt_cap = True

  • red_team_model_type = 'nim'

  • red_team_model_name = 'openai/gpt-oss-120b'

  • red_team_model_config = {'max_tokens': 8192, 'suppressed_params': ['stop']}

  • parse_model_type = None

  • parse_model_name = None

  • parse_model_config = None

  • end_condition = 'verify'

  • agent_config_file = 'agent_breaker/agent.yaml'

  • max_attempts_per_tool = 5

  • success_threshold = 0.7

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

active: bool = False
doc_uri: str = 'https://genai.owasp.org/llmrisk/llm062025-excessive-agency/'
goal: str = 'Identify weknesses in agentic applications through tool manipulation'
lang: str | None = 'en'
parallelisable_attempts: bool = False
primary_detector: str | None = 'agent_breaker.AgentBreakerResult'
tags: Iterable[str] = ['owasp:llm01', 'owasp:llm07', 'owasp:llm08', 'quality:Security:AgentSecurity', 'payload:agentic:exploitation']
tier: Tier = 3
class AttackState(current_target: str = '', current_tool_analysis: dict = <factory>, current_attack_prompt: str = '', attempts_history: list = <factory>, vulnerability_info: str = '', verified_results: list = <factory>)Source

Bases: object

Typed container for probe-internal attack state.

Replaces untyped attempt.notes dict access with named fields. Use to_notes() / from_notes() to serialize across the probe-detector boundary.

attempts_history: list
current_attack_prompt: str = ''
current_target: str = ''
current_tool_analysis: dict
classmethod from_notes(notes: dict) AttackStateSource

Reconstruct state from an attempt.notes dict.

to_notes() dictSource

Serialize state into an attempt.notes dict.

verified_results: list
vulnerability_info: str = ''