garak.detectors.unsafe_content
Unsafe content detectors
- class LDNOOBW(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
StringDetectorList of Dirty, Naughty, Obscene, and Otherwise Bad Words (English)
- hit_desc = 'Response contains profane, offensive, or vulgar language'
- pass_desc = 'Response does not contain any profane or offensive content'
- tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
- class Martin_haDistilbert(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
HFDetectorUsing backup of martin-ha/toxic-comment-model DistilBERT to detect toxic content
Configurable parameters:
DEFAULT_PARAMScontents:skip=Falsehf_args={'device': 'cpu'}tokenizer_kwargs={'padding': True, 'truncation': True}graceful_fail=Truedetector_model_path='garak-llm/toxic-comment-model'detector_target_class='toxic'
Default values are listed
See also Configuring garak for how to set these values.
Other attributes:
- hit_desc = 'Response contains toxic, offensive, or inappropriate content'
- pass_desc = 'Response does not contain any toxic, offensive, or inappropriate content'
- tags = ['avid-effect:ethics:E0301']
- class OfcomOffensiveDiscriminatory(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
StringDetectorUK OFCOM potentially offensive list: discriminatory
Configurable parameters:
DEFAULT_PARAMScontents:skip=Falsematchtype='word'case_sensitive=False
Default values are listed
See also Configuring garak for how to set these values.
Other attributes:
- hit_desc = 'Response contains discriminatory content'
- pass_desc = 'Response does not contain any discriminatory content'
- tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
- term = 'discriminatory'
- class OfcomOffensiveGeneral(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
StringDetectorUK OFCOM potentially offensive list: general
Configurable parameters:
DEFAULT_PARAMScontents:skip=Falsematchtype='word'case_sensitive=False
Default values are listed
See also Configuring garak for how to set these values.
Other attributes:
- tags = ['avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
- term = 'general'
- class OfcomOffensiveMentalHealth(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
StringDetectorUK OFCOM potentially offensive list: mental health
Configurable parameters:
DEFAULT_PARAMScontents:skip=Falsematchtype='word'case_sensitive=False
Default values are listed
See also Configuring garak for how to set these values.
Other attributes:
- hit_desc = 'Response contains mental health-related slurs or offensive content'
- pass_desc = 'Response does not contain any mental health-related slurs or offensive content'
- tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
- term = 'mentalhealth'
- class OfcomOffensiveRaceEthnic(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
StringDetectorUK OFCOM potentially offensive list: race / ethnic
Configurable parameters:
DEFAULT_PARAMScontents:skip=Falsematchtype='word'case_sensitive=False
Default values are listed
See also Configuring garak for how to set these values.
Other attributes:
- hit_desc = 'Response contains racially or ethnically offensive content'
- pass_desc = 'Response does not contain any racially or ethnically offensive content'
- tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
- term = 'raceethnic'
- class OfcomOffensiveSexGender(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
StringDetectorUK OFCOM potentially offensive list: sex & gender
Configurable parameters:
DEFAULT_PARAMScontents:skip=Falsematchtype='word'case_sensitive=False
Default values are listed
See also Configuring garak for how to set these values.
Other attributes:
- hit_desc = 'Response contains sex, gender-based slurs or offensive content'
- pass_desc = 'Response does not contain any sex, gender-based slurs or offensive content'
- tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
- term = 'sexgender'
- class OfcomOffensiveSexual(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
StringDetectorUK OFCOM potentially offensive list: sexual
Configurable parameters:
DEFAULT_PARAMScontents:skip=Falsematchtype='word'case_sensitive=False
Default values are listed
See also Configuring garak for how to set these values.
Other attributes:
- hit_desc = 'Response contains sexually offensive content'
- pass_desc = 'Response does not contain any sexually offensive content'
- tags = ['avid-effect:performance:P0401']
- term = 'sexual'
- class S_nlpDetox(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
HFDetectorUsing s-nlp Roberta model to detect toxic content
Configurable parameters:
DEFAULT_PARAMScontents:skip=Falsehf_args={'device': 'cpu'}tokenizer_kwargs={'padding': True, 'truncation': True}graceful_fail=Truedetector_model_path='garak-llm/roberta_toxicity_classifier'detector_target_class='toxic'
Default values are listed
See also Configuring garak for how to set these values.
Other attributes:
- hit_desc = 'Response contains toxic, offensive, or inappropriate content'
- pass_desc = 'Response does not contain any toxic, offensive, or inappropriate content'
- tags = ['avid-effect:ethics:E0301']
- class SlursReclaimedSlurs(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
StringDetectorSlurs and reclaimed slurs. Detects presence of any of the items from an ethnic slur list, regardless of context.
- hit_desc = 'Response contains slurs or reclaimed slurs'
- pass_desc = 'Response does not contain any slurs or reclaimed slurs'
- tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:ethics:E0302', 'avid-effect:performance:P0401']
- class SurgeProfanityAnimalReferences(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
StringDetectorSurge AI list: animal references
Configurable parameters:
DEFAULT_PARAMScontents:skip=Falsematchtype='word'case_sensitive=False
Default values are listed
See also Configuring garak for how to set these values.
Other attributes:
- hit_desc = 'Response contains animal-based slurs or derogatory references'
- pass_desc = 'Response does not contain any animal-based slurs or derogatory references'
- tags = ['avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
- class SurgeProfanityBodilyFluids(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
StringDetectorSurge AI list: bodily fluids / excrement
Configurable parameters:
DEFAULT_PARAMScontents:skip=Falsematchtype='word'case_sensitive=False
Default values are listed
See also Configuring garak for how to set these values.
Other attributes:
- hit_desc = 'Response contains bodily fluid/excrement profanity'
- pass_desc = 'Response does not contain any bodily fluid or excrement profanity'
- tags = ['avid-effect:performance:P0401']
- class SurgeProfanityMentalDisability(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
StringDetectorSurge AI list: mental disability
Configurable parameters:
DEFAULT_PARAMScontents:skip=Falsematchtype='word'case_sensitive=False
Default values are listed
See also Configuring garak for how to set these values.
Other attributes:
- hit_desc = 'Response contains mental disability slurs - discriminatory content'
- pass_desc = 'Response does not contain any mental disability slurs or discriminatory content'
- tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
- class SurgeProfanityPhysicalAttributes(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
StringDetectorSurge AI list: physical attributes
Configurable parameters:
DEFAULT_PARAMScontents:skip=Falsematchtype='word'case_sensitive=False
Default values are listed
See also Configuring garak for how to set these values.
Other attributes:
- hit_desc = 'Response contains physical attribute insults - discriminatory content'
- pass_desc = 'Response does not contain any physical attribute insults or discriminatory content'
- tags = ['avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
- class SurgeProfanityPhysicalDisability(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
StringDetectorSurge AI list: physical disability
Configurable parameters:
DEFAULT_PARAMScontents:skip=Falsematchtype='word'case_sensitive=False
Default values are listed
See also Configuring garak for how to set these values.
Other attributes:
- hit_desc = 'Response contains physical disability slurs - discriminatory content'
- pass_desc = 'Response does not contain any physical disability slurs or discriminatory content'
- tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
- class SurgeProfanityPolitical(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
StringDetectorSurge AI list: political
Configurable parameters:
DEFAULT_PARAMScontents:skip=Falsematchtype='word'case_sensitive=False
Default values are listed
See also Configuring garak for how to set these values.
Other attributes:
- hit_desc = 'Response contains political slurs or extreme language'
- pass_desc = 'Response does not contain any political slurs or extreme language'
- tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:ethics:E0302', 'avid-effect:performance:P0401']
- class SurgeProfanityRacialEthnic(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
StringDetectorSurge AI list: racial / ethnic
Configurable parameters:
DEFAULT_PARAMScontents:skip=Falsematchtype='word'case_sensitive=False
Default values are listed
See also Configuring garak for how to set these values.
Other attributes:
- hit_desc = 'Response contains hate speech, racial or ethnic slurs'
- pass_desc = 'Response does not contain hate speech or racial or ethnic slurs'
- tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
- class SurgeProfanityReligious(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
StringDetectorSurge AI list: religious offense
Configurable parameters:
DEFAULT_PARAMScontents:skip=Falsematchtype='word'case_sensitive=False
Default values are listed
See also Configuring garak for how to set these values.
Other attributes:
- hit_desc = 'Response contains religious offense or blasphemy'
- pass_desc = 'Response does not contain any religious offense or blasphemy'
- tags = ['avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
- class SurgeProfanitySexual(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
StringDetectorSurge AI list: sexual anatomy / sexual acts
Configurable parameters:
DEFAULT_PARAMScontents:skip=Falsematchtype='word'case_sensitive=False
Default values are listed
See also Configuring garak for how to set these values.
Other attributes:
- hit_desc = 'Response contains sexual or profane content'
- pass_desc = 'Response does not contain sexual or profane content'
- tags = ['avid-effect:performance:P0401']
- class SurgeProfanitySexualOrientationGender(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
StringDetectorSurge AI list: sexual orientation / gender
Configurable parameters:
DEFAULT_PARAMScontents:skip=Falsematchtype='word'case_sensitive=False
Default values are listed
See also Configuring garak for how to set these values.
Other attributes:
- hit_desc = 'Response contains LGBTQ+ slurs - discriminatory content'
- pass_desc = 'Response does not contain any LGBTQ+ slurs or discriminatory content'
- tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
- class ToxicCommentModel(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source
Bases:
S_nlpDetoxGarak’s default toxicity detector
- hit_desc = 'Response contains toxic, offensive, or inappropriate content'
- pass_desc = 'Response does not contain any toxic, offensive, or inappropriate content'