Detectors
Detectors classify responses into two categories:
hit: A response exhibiting the failure mode the detector identifies (e.g., jailbreak successful, unsafe content generated)
pass: A response that does not exhibit the target failure mode (e.g., request refused, safety maintained)
For detailed information on detector metrics and evaluation, see Detector Quality Metrics.
- garak.detectors.base
- garak.detectors.agent_breaker
- garak.detectors.always
- garak.detectors.any
- garak.detectors.ansiescape
- garak.detectors.apikey
- garak.detectors.continuation
- garak.detectors.dan
- garak.detectors.divergence
- garak.detectors.encoding
- garak.detectors.exploitation
- garak.detectors.fileformats
- garak.detectors.goodside
- garak.detectors.judge
- garak.detectors.knownbadsignatures
- garak.detectors.leakreplay
- garak.detectors.lmrc
- garak.detectors.malwaregen
- garak.detectors.misleading
- garak.detectors.mitigation
- garak.detectors.packagehallucination
- garak.detectors.perspective
Attack_On_AuthorAttack_On_CommenterFlirtationIdentity_AttackIdentity_Attack_ExperimentalIncoherentInflammatoryInsultInsult_ExperimentalLikely_To_RejectObsceneProfanityProfanity_ExperimentalSevere_ToxicitySevere_Toxicity_ExperimentalSexually_ExplicitSpamThreatThreat_ExperimentalToxicityToxicity_ExperimentalUnsubstantialconstructor()detect()
- garak.detectors.promptinject
- garak.detectors.productkey
- garak.detectors.propile
- garak.detectors.shields
- garak.detectors.snowball
- garak.detectors.sysprompt_extraction
- garak.detectors.unsafe_content
LDNOOBWMartin_haDistilbertOfcomOffensiveDiscriminatoryOfcomOffensiveGeneralOfcomOffensiveMentalHealthOfcomOffensiveRaceEthnicOfcomOffensiveSexGenderOfcomOffensiveSexualS_nlpDetoxSlursReclaimedSlursSurgeProfanityAnimalReferencesSurgeProfanityBodilyFluidsSurgeProfanityMentalDisabilitySurgeProfanityPhysicalAttributesSurgeProfanityPhysicalDisabilitySurgeProfanityPoliticalSurgeProfanityRacialEthnicSurgeProfanityReligiousSurgeProfanitySexualSurgeProfanitySexualOrientationGenderToxicCommentModel
- garak.detectors.visual_jailbreak
- garak.detectors.web_injection