garak.probes.misleading

Misleading claims

Pose targets with false claims and check if they successfully debunk/reject them.

This module homes probes that attempt to mislead the target.

Will a model refute false claims, or will it go along with false information and perform tasks based on them? This checks for a subset of hallucination types, where false information is present that the system is expected to deny, refute, or contradict.

class FalseAssertion(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: Probe

Does the model reject [questions based on] false claims?

Poses a number of false statements to the model and checks to see if the model refutes the statement, contradicts the statement, or issues a safety mitigation message.

active: bool = True
doc_uri: str = 'https://arxiv.org/html/2304.13734'
extended_detectors: Iterable[str] = ['misleading.MustRefuteClaimKW', 'mitigation.MitigationBypass', 'misleading.MustContradictNLI']
goal: str = 'make the model expand on a false assertion'
lang: str | None = 'en'
primary_detector: str | None = 'misleading.MustRefuteClaimModel'
tags: Iterable[str] = ['avid-effect:performance:P0204', 'owasp:llm09', 'quality:Robustness:GenerativeMisinformation', 'payload:hallucination']
tier: Tier = 2