analyze

Processing run results is a core part of getting actionable information out of a garak run. We provide a range of scripts and constructs under garak.analyze that assist in this.

Note that these tools expect the report JSONL format from the same version of garak. For example, scripts in garak.analyze under v0.14.0 expect to receive data generated under garak 0.14.0. There may be some graceful failure or backwards compatibility but this is not guaranteed, especially while garak is pre-v1.0. Patch releases are not expected to impact input/output formats – however, minor or major version bumps may come with updates that are not backwards compatible with older report files.

garak.analyze.aggregate_reports

Aggregate multiple garak reports on the same generator. Useful for e.g. assembling a report that’s been run one probe at a time.

Invoke and see usage via command line with python -m garak.analyze.aggregate_reports

garak.analyze.analyze_log

Analyze a garak report.jsonl log file. Print out summary stats, and which prompts led to failures.

Invoke and see usage via command line with python -m garak.analyze.analyze_log

garak.analyze.calibration

Module for code around calibrating garak (i.e. calculating bases for relative/Z-scores)

class Calibration(calibration_path: None | str | Path = None)Source 

Bases: object

Helper for managing probe/detector score calibration data processing

get_z_score(probe_module: str, probe_classname: str, detector_module: str, detector_classname: str, score: float) → float | NoneSource 

garak.analyze.count_tokens

Count the number of characters sent and received based on prompts, outputs, and generations

Invoke and see usage via command line with python -m garak.analyze.count_tokens

garak.analyze.get_tree

If a TreeSearchProbe probe was used (garak.probes.base), display the tree of items explored during the run.

Invoke and see usage via command line with python -m garak.analyze.get_tree

garak.analyze.misp

Reporting on category-level information; categories denoted internally in MISP format.

Invoke and see usage via command line with python -m garak.analyze.misp

garak.analyze.perf_stats

Calculate a garak calibration from a set of report.jsonl outputs. For more details, see Calibration

Invoke and see usage via command line with python -m garak.analyze.perf_stats

garak.analyze.qual_review

Generate a qualitative review of a garak report, and highlight heavily failing probes in Markdown report. Gives ten positive and ten negative examples from failing probes Takes a report.jsonl, and an optional bag.json (e.g. data/calibration/calibration.json by default) as input

Invoke and see usage via command line with python -m garak.analyze.qual_review

garak.analyze.report_avid

Prints an AVID (https://avidml.org/) report given a garak report in jsonl.

Invoke and see usage via command line with python -m garak.analyze.report_avid

garak.analyze.report_digest

Invoke and see usage via command line with python -m garak.analyze.report_digest

Generate reports from garak report JSONL

see argparse config below for usage

append_report_object(reportfile: IO, object: dict)Source 

build_digest(report_filename: str, config=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/garak/_config.py'>)Source 

build_html(digest: dict, config=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/garak/_config.py'>) → strSource 

plugin_docstring_to_description(docstring)Source 

garak.analyze.tbsa

Generate a single numeric score for a run using tier-based score aggregation. Note that this score is lossy and difficult to make comparable – it will change with different configs and across different garak versions.

Invoke and see usage via command line with python -m garak.analyze.tbsa

Read full details: Tier-Based Score Aggregation