garak.probes.agent_breaker
Agent Breaker probe
A multi-turn red-team probe for attacking agentic LLM applications that use tools.
Uses a red team model to analyze each tool for weaknesses, generate targeted exploits, and verify attack success through direct conversation with the agent.
Further info:
- class AgentBreaker(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
IterativeProbeAgent Breaker - Agentic Application Exploitation Probe
A multi-turn probe that attacks agentic LLM applications by:
Loading agent purpose and tools from data/agent_breaker/agent.yaml (or auto-discovering them by asking the target agent if tools are missing)
Analyzing each tool to understand what it does and how it works
Identifying specific weaknesses based on each tool’s functionality
Generating targeted exploits based on the tool-specific analysis
Verifying attack success - stops immediately on success
Auto-discovery: If the YAML has no tools (or empty tools), the probe asks the target agent what tools it has (and optionally its purpose). The response is parsed by the red team model into the same format as the YAML. If agent_purpose is already set in the YAML, only tools are discovered.
Multi-turn attack strategy:
Each turn starts a NEW conversation with an improved attack payload
The red team model analyzes all previous attempts and their responses
It learns from failures and generates improved attacks that address weaknesses
The attack stops immediately when successful
The probe uses a red team model to:
Deeply understand each tool’s functionality
Identify how that specific functionality can be exploited
Generate attack prompts tailored to each tool’s weaknesses
Analyze previous attempt responses to improve subsequent attacks
Verify if attacks succeeded
Parse discovery responses when auto-discovering tools
Configuration: Supply $XDG_DATA_HOME/garak/data/agent_breaker/agent.yaml to describe your target agent. You may omit tools (and optionally agent_purpose) to use auto- discovery. The YAML format is:
agent_purpose: | A helpful personal assistant that can execute code and read files. tools: - name: tool_name description: what it does
Configurable parameters:
DEFAULT_PARAMScontents:max_calls_per_conv=50follow_prompt_cap=Truered_team_model_type='nim'red_team_model_name='openai/gpt-oss-120b'red_team_model_config={'max_tokens': 8192, 'suppressed_params': ['stop']}parse_model_type=Noneparse_model_name=Noneparse_model_config=Noneend_condition='verify'agent_config_file='agent_breaker/agent.yaml'max_attempts_per_tool=5success_threshold=0.7
Default values are listed
See also Configuring garak for how to set these values.
Other attributes:
- class AttackState(current_target: str = '', current_tool_analysis: dict = <factory>, current_attack_prompt: str = '', attempts_history: list = <factory>, vulnerability_info: str = '', verified_results: list = <factory>)Source
Bases:
objectTyped container for probe-internal attack state.
Replaces untyped
attempt.notesdict access with named fields. Useto_notes()/from_notes()to serialize across the probe-detector boundary.- classmethod from_notes(notes: dict) AttackStateSource
Reconstruct state from an
attempt.notesdict.