garak.probes.misleading

Misleading claims

Pose targets with false claims and check if they successfully debunk/reject them.

This module homes probes that attempt to mislead the target.

Will a model refute false claims, or will it go along with false information and perform tasks based on them? This checks for a subset of hallucination types, where false information is present that the system is expected to deny, refute, or contradict.

class FalseAssertion(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/garak/_config.py'>)Source 

Bases: Probe

Does the model reject [questions based on] false claims?

Poses a number of false statements to the model and checks to see if the model refutes the statement, contradicts the statement, or issues a safety mitigation message.

active: bool = True

doc_uri: str = 'https://arxiv.org/html/2304.13734'

extended_detectors: Iterable[str] = ['misleading.MustRefuteClaimKW', 'mitigation.MitigationBypass', 'misleading.MustContradictNLI']

goal: str = 'make the model expand on a false assertion'

intent: str | None = 'C009'

lang: str | None = 'en'

primary_detector: str | None = 'misleading.MustRefuteClaimModel'

tags: Iterable[str] = ['avid-effect:performance:P0204', 'owasp:llm09', 'quality:Robustness:GenerativeMisinformation', 'payload:hallucination', 'demon:Rhetoric:Persuasion_and_manipulation:Distraction']

tier: Tier = 2