attempt

In garak, Attempt objects track a single prompt and the results of running it on through the generator. Probes work by creating a set of garak.attempt.Attempt objects and setting their class properties. These are passed by the harness to the generator, and the output added to the attempt. Then, a detector assesses the outputs from that attempt and the detector’s scores are saved in the attempt. Finally, an evaluator makes judgments of these scores, and writes hits out to the hitlog for any successful probing attempts.

Within this, Converastion, Turn, and Message objects encapsulate conversational turns either sent to models (i.e. prompts) or returned from models (i.e. model output). garak uses an object to encapsulate this to allow easy switching with multimodal probes and generators.

garak.attempt

Defines the Attempt class, which encapsulates a prompt with metadata and results

class Attempt(status=0, prompt=None, probe_classname=None, probe_params=None, targets=None, notes=None, detector_results=None, goal=None, seq=-1, reverse_translation_outputs=None)Source

Bases: object

A class defining objects that represent everything that constitutes a single attempt at evaluating an LLM.

Parameters:
  • status (int) – The status of this attempt; ATTEMPT_NEW, ATTEMPT_STARTED, or ATTEMPT_COMPLETE

  • prompt (Message|Conversation) – The processed prompt that will presented to the generator

  • probe_classname (str) – Name of the probe class that originated this Attempt

  • probe_params (dict, optional) – Non-default parameters logged by the probe

  • targets (List(str), optional) – A list of target strings to be searched for in generator responses to this attempt’s prompt

  • outputs (List(Message)) – The outputs from the generator in response to the prompt

  • notes (dict) – A free-form dictionary of notes accompanying the attempt

  • detector_results (dict) – A dictionary of detector scores, keyed by detector name, where each value is a list of scores corresponding to each of the generator output strings in outputs

  • goal (str) – Free-text simple description of the goal of this attempt, set by the originating probe

  • seq (int) – Sequence number (starting 0) set in garak.probes.base.Probe.probe(), to allow matching individual prompts with lists of answers/targets or other post-hoc ordering and keying

  • conversations (List(Conversation)) – conversation turn histories

  • reverse_translation_outputs – The reverse translation of output based on the original language of the probe

  • reverse_translation_outputs – List(str)

Typical use:

  • An attempt tracks a seed prompt and responses to the prompt.

  • There is a 1:1 relationship between an attempt and a source prompt.

  • Attempts track all generations. This means conversations tracks many histories, one per generation.

  • this means messages tracks many histories, one per generation

  • For compatibility, setting Attempt.prompt sets just one turn and the prompt is unpacked later when output is set. We don’t know the number of generations to expect until some output arrives.

  • To keep alignment, generators must return lists of length generations.

Patterns and expectations for Attempt access:

  • .prompt returns the first user prompt.

  • .outputs returns the most recent model outputs.

Patterns and expectations for Attempt setting:

  • .prompt sets the first prompt, or fails if the first prompt is already set.

  • .outputs sets a new layer of model responses. Silently handles expansion of prompt to multiple histories. Prompt must be set before outputs are set.

property all_outputs: List[Message]Source
as_dict() dictSource

Converts the attempt to a dictionary.

property langSource

Language of first Message determines the language of the attempt

property outputs: List[Message]Source
outputs_for(lang) List[Message]Source

outputs for a known language

When “*” or None are passed returns the original model output

property prompt: Conversation | NoneSource
prompt_for(lang) ConversationSource

prompt for a known language

When “*” or None are passed returns the prompt passed to the model

class Conversation(turns: List[Turn] = <factory>, notes: dict | None = <factory>)Source

Bases: object

Class to maintain a sequence of Turn objects and, if relevant, apply conversation templates.

Parameters:
  • turns (list) – A list of Turns

  • notes (dict) – Free form dictionary of notes for the conversation

static from_dict(value: dict)Source
classmethod from_openai(conv: List[dict], notes: dict | None = None)Source
last_message(role=None) MessageSource

The last message exchanged in the conversation

Parameters:

role (str) – Optional, role to search for

notes: dict | None
turns: List[Turn]
class Message(text: str = None, lang: str = None, data_path: str | None = None, data_type: Tuple[str | None, str | None] | None=None, data_checksum: str | None = None, notes: dict | None = <factory>)Source

Bases: object

Object to represent a single message posed to or received from a generator

Messages can be prompts, replies, system prompts. While many prompts are text, they may also be (or include) images, audio, files, or even a composition of these. The Turn object encapsulates this flexibility. Message doesn’t yet support multiple attachments of the same type.

Parameters:
  • text (str) – Text of the prompt/response

  • data_path (Union[str, Path]) – Path to attachment

  • data_type (Tuple(str, str)) – (type, encoding) Mime type of data

  • data_checksum (str) – sha256 checksum of data loaded

  • data (Any) – Data to attach

  • lang (str (bcp47 language code or *)) – single language code for text content

  • notes (dict) – Free form dictionary of notes for the turn

property dataSource
data_checksum: str | None = None
data_path: str | None = None
data_type: Tuple[str | None, str | None] | None = None
lang: str = None
notes: dict | None
text: str = None
class Turn(role: str, content: Message)Source

Bases: object

Object to attach actor context to a message, denoted as taking a Turn in the conversation

Parameters:

role (str) – Role of the participant who issued the utterance Expected: [“system”, “user”, “assistant”]

content: Message
classmethod from_dict(value: dict)Source
role: str