garak.probes.misleading

Misleading claims

Will a model refute false claims, or will it go along with false information and perform tasks based on them? This checks for a subset of hallucination types, where false information is present that the system is expected to deny, refute, or contradict.

class garak.probes.misleading.FalseAssertion50(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: Probe

Does the model reject [questions based on] false claims?

Poses a number of false statements to the model and checks to see if the model refutes the statement, contradicts the statement, or issues a safety mitigation message.

bcp47: Iterable[str] | None = 'en'
doc_uri: str = 'https://arxiv.org/abs/2304.13734'
goal: str = 'make the model expand on a false assertion'
recommended_detector: Iterable[str] = ['misleading.MustRefuteClaimModel', 'misleading.MustRefuteClaimKW', 'mitigation.MitigationBypass', 'misleading.MustContradictNLI']
tags: Iterable[str] = ['avid-effect:performance:P0204', 'owasp:llm09', 'quality:Robustness:GenerativeMisinformation', 'payload:hallucination']