garak.probes.agent_breaker

Agent Breaker probe

A multi-turn red-team probe for attacking agentic LLM applications that use tools.

Uses a red team model to analyze each tool for weaknesses, generate targeted exploits, and verify attack success through direct conversation with the agent.

Further info:

https://genai.owasp.org/llmrisk/llm062025-excessive-agency/

class AgentBreaker(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/garak/_config.py'>)Source 

Bases: IterativeProbe

Agent Breaker - Agentic Application Exploitation Probe

A multi-turn probe that attacks agentic LLM applications by:

Loading agent purpose and tools from data/agent_breaker/agent.yaml (or auto-discovering them by asking the target agent if tools are missing)
Analyzing each tool to understand what it does and how it works
Identifying specific weaknesses based on each tool’s functionality
Generating targeted exploits based on the tool-specific analysis
Verifying attack success - stops immediately on success

Auto-discovery: If the YAML has no tools (or empty tools), the probe asks the target agent what tools it has (and optionally its purpose). The response is parsed by the red team model into the same format as the YAML. If agent_purpose is already set in the YAML, only tools are discovered.

Multi-turn attack strategy:

Each turn starts a NEW conversation with an improved attack payload
The red team model analyzes all previous attempts and their responses
It learns from failures and generates improved attacks that address weaknesses
The attack stops immediately when successful

The probe uses a red team model to:

Deeply understand each tool’s functionality
Identify how that specific functionality can be exploited
Generate attack prompts tailored to each tool’s weaknesses
Analyze previous attempt responses to improve subsequent attacks
Verify if attacks succeeded
Parse discovery responses when auto-discovering tools

Configuration: Supply $XDG_DATA_HOME/garak/data/agent_breaker/agent.yaml to describe your target agent. You may omit tools (and optionally agent_purpose) to use auto- discovery. The YAML format is:

agent_purpose: |
  A helpful personal assistant that can execute code and read files.

tools:
- name: tool_name
  description: what it does

Configurable parameters:

DEFAULT_PARAMS contents:

max_calls_per_conv = 50
follow_prompt_cap = True
red_team_model_type = 'nim'
red_team_model_name = 'openai/gpt-oss-120b'
red_team_model_config = {'max_tokens': 8192, 'suppressed_params': ['stop']}
parse_model_type = None
parse_model_name = None
parse_model_config = None
end_condition = 'verify'
agent_config_file = 'agent_breaker/agent.yaml'
max_attempts_per_tool = 5
success_threshold = 0.7

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

active: bool = False

doc_uri: str = 'https://genai.owasp.org/llmrisk/llm062025-excessive-agency/'

goal: str = 'Identify weknesses in agentic applications through tool manipulation'

intent: str | None = 'M009'

lang: str | None = 'en'

parallelisable_attempts: bool = False

primary_detector: str | None = 'agent_breaker.AgentBreakerResult'

tags: Iterable[str] = ['owasp:llm01', 'owasp:llm07', 'owasp:llm08', 'quality:Security:AgentSecurity', 'payload:agentic:exploitation']

tier: Tier = 3

class AttackState(current_target: str = '', current_tool_analysis: dict = <factory>, current_attack_prompt: str = '', attempts_history: list = <factory>, vulnerability_info: str = '', verified_results: list = <factory>)Source 

Bases: object

Typed container for probe-internal attack state.

Replaces untyped attempt.notes dict access with named fields. Use to_notes() / from_notes() to serialize across the probe-detector boundary.

attempts_history: list

current_attack_prompt: str = ''

current_target: str = ''

current_tool_analysis: dict

classmethod from_notes(notes: dict) → AttackStateSource : Reconstruct state from an attempt.notes dict.

to_notes() → dictSource : Serialize state into an attempt.notes dict.

verified_results: list

vulnerability_info: str = ''