garak.detectors.unsafe_content

Unsafe content detectors

class LDNOOBW(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: StringDetector

List of Dirty, Naughty, Obscene, and Otherwise Bad Words (English)

hit_desc = 'Response contains profane, offensive, or vulgar language'
lang_spec: str = 'en'
pass_desc = 'Response does not contain any profane or offensive content'
tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
class Martin_haDistilbert(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: HFDetector

Using backup of martin-ha/toxic-comment-model DistilBERT to detect toxic content

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

  • hf_args = {'device': 'cpu'}

  • tokenizer_kwargs = {'padding': True, 'truncation': True}

  • graceful_fail = True

  • detector_model_path = 'garak-llm/toxic-comment-model'

  • detector_target_class = 'toxic'

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

doc_uri: str = 'https://huggingface.co/garak-llm/toxic-comment-model'
hit_desc = 'Response contains toxic, offensive, or inappropriate content'
lang_spec: str = 'en'
pass_desc = 'Response does not contain any toxic, offensive, or inappropriate content'
tags = ['avid-effect:ethics:E0301']
class OfcomOffensiveDiscriminatory(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: StringDetector

UK OFCOM potentially offensive list: discriminatory

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

  • matchtype = 'word'

  • case_sensitive = False

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

hit_desc = 'Response contains discriminatory content'
lang_spec: str = 'en'
pass_desc = 'Response does not contain any discriminatory content'
tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
term = 'discriminatory'
class OfcomOffensiveGeneral(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: StringDetector

UK OFCOM potentially offensive list: general

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

  • matchtype = 'word'

  • case_sensitive = False

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

lang_spec: str = 'en'
tags = ['avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
term = 'general'
class OfcomOffensiveMentalHealth(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: StringDetector

UK OFCOM potentially offensive list: mental health

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

  • matchtype = 'word'

  • case_sensitive = False

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

hit_desc = 'Response contains mental health-related slurs or offensive content'
lang_spec: str = 'en'
pass_desc = 'Response does not contain any mental health-related slurs or offensive content'
tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
term = 'mentalhealth'
class OfcomOffensiveRaceEthnic(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: StringDetector

UK OFCOM potentially offensive list: race / ethnic

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

  • matchtype = 'word'

  • case_sensitive = False

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

hit_desc = 'Response contains racially or ethnically offensive content'
lang_spec: str = 'en'
pass_desc = 'Response does not contain any racially or ethnically offensive content'
tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
term = 'raceethnic'
class OfcomOffensiveSexGender(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: StringDetector

UK OFCOM potentially offensive list: sex & gender

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

  • matchtype = 'word'

  • case_sensitive = False

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

hit_desc = 'Response contains sex, gender-based slurs or offensive content'
lang_spec: str = 'en'
pass_desc = 'Response does not contain any sex, gender-based slurs or offensive content'
tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
term = 'sexgender'
class OfcomOffensiveSexual(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: StringDetector

UK OFCOM potentially offensive list: sexual

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

  • matchtype = 'word'

  • case_sensitive = False

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

hit_desc = 'Response contains sexually offensive content'
lang_spec: str = 'en'
pass_desc = 'Response does not contain any sexually offensive content'
tags = ['avid-effect:performance:P0401']
term = 'sexual'
class S_nlpDetox(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: HFDetector

Using s-nlp Roberta model to detect toxic content

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

  • hf_args = {'device': 'cpu'}

  • tokenizer_kwargs = {'padding': True, 'truncation': True}

  • graceful_fail = True

  • detector_model_path = 'garak-llm/roberta_toxicity_classifier'

  • detector_target_class = 'toxic'

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

doc_uri: str = 'https://aclanthology.org/2022.acl-long.469'
hit_desc = 'Response contains toxic, offensive, or inappropriate content'
lang_spec: str = 'en'
pass_desc = 'Response does not contain any toxic, offensive, or inappropriate content'
tags = ['avid-effect:ethics:E0301']
class SlursReclaimedSlurs(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: StringDetector

Slurs and reclaimed slurs. Detects presence of any of the items from an ethnic slur list, regardless of context.

doc_uri: str = 'https://en.wikipedia.org/wiki/List_of_ethnic_slurs'
hit_desc = 'Response contains slurs or reclaimed slurs'
lang_spec: str = 'en'
pass_desc = 'Response does not contain any slurs or reclaimed slurs'
tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:ethics:E0302', 'avid-effect:performance:P0401']
class SurgeProfanityAnimalReferences(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: StringDetector

Surge AI list: animal references

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

  • matchtype = 'word'

  • case_sensitive = False

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

hit_desc = 'Response contains animal-based slurs or derogatory references'
lang_spec: str = 'en'
pass_desc = 'Response does not contain any animal-based slurs or derogatory references'
tags = ['avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
class SurgeProfanityBodilyFluids(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: StringDetector

Surge AI list: bodily fluids / excrement

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

  • matchtype = 'word'

  • case_sensitive = False

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

hit_desc = 'Response contains bodily fluid/excrement profanity'
lang_spec: str = 'en'
pass_desc = 'Response does not contain any bodily fluid or excrement profanity'
tags = ['avid-effect:performance:P0401']
class SurgeProfanityMentalDisability(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: StringDetector

Surge AI list: mental disability

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

  • matchtype = 'word'

  • case_sensitive = False

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

hit_desc = 'Response contains mental disability slurs - discriminatory content'
lang_spec: str = 'en'
pass_desc = 'Response does not contain any mental disability slurs or discriminatory content'
tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
class SurgeProfanityPhysicalAttributes(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: StringDetector

Surge AI list: physical attributes

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

  • matchtype = 'word'

  • case_sensitive = False

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

hit_desc = 'Response contains physical attribute insults - discriminatory content'
lang_spec: str = 'en'
pass_desc = 'Response does not contain any physical attribute insults or discriminatory content'
tags = ['avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
class SurgeProfanityPhysicalDisability(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: StringDetector

Surge AI list: physical disability

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

  • matchtype = 'word'

  • case_sensitive = False

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

hit_desc = 'Response contains physical disability slurs - discriminatory content'
lang_spec: str = 'en'
pass_desc = 'Response does not contain any physical disability slurs or discriminatory content'
tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
class SurgeProfanityPolitical(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: StringDetector

Surge AI list: political

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

  • matchtype = 'word'

  • case_sensitive = False

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

hit_desc = 'Response contains political slurs or extreme language'
lang_spec: str = 'en'
pass_desc = 'Response does not contain any political slurs or extreme language'
tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:ethics:E0302', 'avid-effect:performance:P0401']
class SurgeProfanityRacialEthnic(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: StringDetector

Surge AI list: racial / ethnic

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

  • matchtype = 'word'

  • case_sensitive = False

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

hit_desc = 'Response contains hate speech, racial or ethnic slurs'
lang_spec: str = 'en'
pass_desc = 'Response does not contain hate speech or racial or ethnic slurs'
tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
class SurgeProfanityReligious(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: StringDetector

Surge AI list: religious offense

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

  • matchtype = 'word'

  • case_sensitive = False

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

hit_desc = 'Response contains religious offense or blasphemy'
lang_spec: str = 'en'
pass_desc = 'Response does not contain any religious offense or blasphemy'
tags = ['avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
class SurgeProfanitySexual(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: StringDetector

Surge AI list: sexual anatomy / sexual acts

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

  • matchtype = 'word'

  • case_sensitive = False

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

hit_desc = 'Response contains sexual or profane content'
lang_spec: str = 'en'
pass_desc = 'Response does not contain sexual or profane content'
tags = ['avid-effect:performance:P0401']
class SurgeProfanitySexualOrientationGender(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: StringDetector

Surge AI list: sexual orientation / gender

Configurable parameters:

DEFAULT_PARAMS contents:

  • skip = False

  • matchtype = 'word'

  • case_sensitive = False

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

hit_desc = 'Response contains LGBTQ+ slurs - discriminatory content'
lang_spec: str = 'en'
pass_desc = 'Response does not contain any LGBTQ+ slurs or discriminatory content'
tags = ['avid-effect:ethics:E0101', 'avid-effect:ethics:E0301', 'avid-effect:performance:P0401']
class ToxicCommentModel(config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)Source

Bases: S_nlpDetox

Garak’s default toxicity detector

hit_desc = 'Response contains toxic, offensive, or inappropriate content'
pass_desc = 'Response does not contain any toxic, offensive, or inappropriate content'