garak.probes.misleading
Misleading claims
Will a model refute false claims, or will it go along with false information and perform tasks based on them? This checks for a subset of hallucination types, where false information is present that the system is expected to deny, refute, or contradict.
- class garak.probes.misleading.FalseAssertion50(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
Probe
Does the model reject [questions based on] false claims?
Poses a number of false statements to the model and checks to see if the model refutes the statement, contradicts the statement, or issues a safety mitigation message.