Translation support

Garak enables translation support for probe and detector keywords and triggers. Allowing testing of models that accept and produce text in languages other than the language the plugin was written for.

Limitations

  • This functionality is strongly coupled to BCP47 code “en” for sentence detection and structure at this time.

  • Reverse translation is required for snowball probes, and Huggingface detectors due to model load formats.

  • Huggingface detectors primarily load English models. Requiring a target language NLI model for the detector.

  • If probes or detectors fail to load, you need may need to choose a smaller local translation model or utilize a remote service.

  • Translation may add significant execution time to the run depending on resources available.

Supported translation services

API KEY Requirements

To use use DeepL API or Riva API to translate probe and detector keywords and triggers from cloud services an API key must be supplied.

API keys for the preferred service can be obtained in following locations:

Supported languages for remote services:

API keys can be stored in environment variables with the following commands:

DeepL

export DEEPL_API_KEY=xxxx

RIVA

export RIVA_API_KEY=xxxx

Configuration file

Translation function is configured in the run section of a configuration with the following keys:

target_lang - A single BCP47 entry designating the language of the target under test. “ja”, “fr”, “jap” etc. langproviders - A list of language pair designated translator configurations.

  • Note: The Helsinki-NLP/opus-mt-{source},{target} case uses different language formats. The language codes used to name models are inconsistent. Two-letter codes can usually be found in the Google Admin SDK, while three-letter codes require a search such as “language code {code}”. More details can be found in the OPUS-MT-train repository.

A language provider configuration is provided using the project’s configurable pattern with the following keys:

  • language - (required) A , separated pair of BCP47 entires describing translation format provided by the configuration

  • model_type - (required) the langproviders module and optional instance class to be instantiated; local, remote, remote.DeeplTranslator etc.

  • model_name - (conditional) the model name loaded for translation. This field is required for local translator model_type

(Optional) Model specific parameters defined by the translator model type may exist.

  • Note: local translation support loads a model and is not designed to support crossing the multi-processing boundary.

The translator configuration can be written to a file and the path passed, with the --config cli option.

An example template is provided below.

run:
  target_lang: <target-language-code>
  langproviders:
    - language: <source-language-code>,<target-language-code>
      api_key: <your-API-key>
      model_type: <translator-module-or-module.classname>
      model_name: <huggingface-model-name>
    - language: <target-language-code>,<source-language-code>
      api_key: <your-API-key>
      model_type: <translator-module-or-module.classname>
      model_name: <huggingface-model-name>
  • Note: each translator is configured for a single translation pair and specification is required in each direction for a run to proceed.

Examples for translation configuration

DeepL

To use DeepL translation in garak, run the following command: You use the following yaml config.

run:
  target_lang: <target-language-code>
  langproviders:
    - language: <source-language-code>,<target-language-code>
      model_type: remote.DeeplTranslator
    - language: <target-language-code>,<source-language-code>
      model_type: remote.DeeplTranslator
export DEEPL_API_KEY=xxxx
python3 -m garak --model_type nim --model_name meta/llama-3.1-8b-instruct --probes encoding --config {path to your yaml config file}

Riva

For Riva, run the following command: You use the following yaml config.

run:
  target_lang: <target-language-code>
  langproviders:
    - language: <source-language-code>,<target-language-code>
      model_type: remote
    - language: <target-language-code>,<source-language-code>
      model_type: remote
export RIVA_API_KEY=xxxx
python3 -m garak --model_type nim --model_name meta/llama-3.1-8b-instruct --probes encoding --config {path to your yaml config file}

Local

For local translation, use the following command: You use the following yaml config.

run:
  target_lang: jap
  langproviders:
    - language: en,jap
      model_type: local
    - language: jap,en
      model_type: local
python3 -m garak --model_type nim --model_name meta/llama-3.1-8b-instruct --probes encoding --config {path to your yaml config file}

The default configuration will load Helsinki-NLP MarianMT models for local translation.

Additional support for Huggingface M2M100Model type only is enabled by providing model_name for local translators. The model name provided must contain m2m100 to be loaded by garak.

run:
  target_lang: ja
  langproviders:
    - language: en,ja
      model_type: local
      model_name: facebook/m2m100_418M
    - language: jap,en
      model_type: local
      model_name: facebook/m2m100_418M
python3 -m garak --model_type nim --model_name meta/llama-3.1-8b-instruct --probes encoding --config {path to your yaml config file}