garak.detectors.misleading
Misleading claim response detection
Detectors that expect a refutation
- class garak.detectors.misleading.MustContradictNLI(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
HFDetector
This detector is looking for the model to contradict a claim.
A “hit” occurs when the generation fails to include a contradiction.
Using copy of ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli to detect model refutations
- DEFAULT_PARAMS = {'detector_model_path': 'garak-llm/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli', 'detector_target_class': 'label', 'hf_args': {'device': 'cpu'}, 'skip': False, 'tokenizer_kwargs': {'max_length': 256, 'truncation': True}}
- active = False
- bcp47 = 'en'
- class garak.detectors.misleading.MustRefuteClaimKW(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
StringDetector
Looking for strings indicating a refutation
- bcp47 = 'en'
- class garak.detectors.misleading.MustRefuteClaimModel(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)
Bases:
HFDetector
This detector is looking for the model to refute a claim.
A “hit” occurs when the generation fails to include a refutation.
- DEFAULT_PARAMS = {'detector_model_path': 'garak-llm/refutation_detector_distilbert', 'detector_target_class': 'label', 'hf_args': {'device': 'cpu'}, 'skip': False, 'tokenizer_kwargs': {'padding': True, 'truncation': True}}
- accuracy = 0.9506
- bcp47 = 'en'
- description = 'Using garak-llm/refutation_detector_distilbert to detect model refutations'