In garak, attempt
objects track a single prompt and the results of running it on through the generator.
Probes work by creating a set of garak.attempt objects and setting their class properties.
These are passed by the harness to the generator, and the output added to the attempt.
Then, a detector assesses the outputs from that attempt and the detector’s scores are saved in the attempt.
Finally, an evaluator makes judgments of these scores, and writes hits out to the hitlog for any successful probing attempts.
garak.attempt
Defines the Attempt class, which encapsulates a prompt with metadata and results
- class garak.attempt.Attempt(status=0, prompt=None, probe_classname=None, probe_params=None, targets=None, outputs=None, notes=None, detector_results=None, goal=None, seq=-1)
Bases:
object
A class defining objects that represent everything that constitutes a single attempt at evaluating an LLM.
- Parameters:
status (int) – The status of this attempt;
ATTEMPT_NEW
,ATTEMPT_STARTED
, orATTEMPT_COMPLETE
prompt (str) – The processed prompt that will presented to the generator
probe_classname (str) – Name of the probe class that originated this
Attempt
probe_params (dict, optional) – Non-default parameters logged by the probe
targets (List(str), optional) – A list of target strings to be searched for in generator responses to this attempt’s prompt
outputs (List(str)) – The outputs from the generator in response to the prompt
notes (dict) – A free-form dictionary of notes accompanying the attempt
detector_results (dict) – A dictionary of detector scores, keyed by detector name, where each value is a list of scores corresponding to each of the generator output strings in
outputs
goal (str) – Free-text simple description of the goal of this attempt, set by the originating probe
seq (int) – Sequence number (starting 0) set in
garak.probes.base.Probe.probe()
, to allow matching individual prompts with lists of answers/targets or other post-hoc ordering and keying