Translation support
Garak enables translation support for probe and detector keywords and triggers. Allowing testing of models that accept and produce text in languages other than the language the plugin was written for.
Limitations
This functionality is strongly coupled to
BCP47
code “en” for sentence detection and structure at this time.Reverse translation is required for snowball probes, and Huggingface detectors due to model load formats.
Huggingface detectors primarily load English models. Requiring a target language NLI model for the detector.
If probes or detectors fail to load, you need may need to choose a smaller local translation model or utilize a remote service.
Translation may add significant execution time to the run depending on resources available.
Supported translation services
- Huggingface: This project supports usage of the following translation models:
API KEY Requirements
To use use DeepL API or Riva API to translate probe and detector keywords and triggers from cloud services an API key must be supplied.
API keys for the preferred service can be obtained in following locations:
Supported languages for remote services:
API keys can be stored in environment variables with the following commands:
DeepL
export DEEPL_API_KEY=xxxx
RIVA
export RIVA_API_KEY=xxxx
Configuration file
Translation function is configured in the run
section of a configuration with the following keys:
target_lang
- A single BCP47
entry designating the language of the target under test. “ja”, “fr”, “jap” etc.
langproviders
- A list of language pair designated translator configurations.
Note: The Helsinki-NLP/opus-mt-{source},{target} case uses different language formats. The language codes used to name models are inconsistent. Two-letter codes can usually be found in the Google Admin SDK, while three-letter codes require a search such as “language code {code}”. More details can be found in the OPUS-MT-train repository.
A language provider configuration is provided using the project’s configurable pattern with the following keys:
language
- (required) A,
separated pair ofBCP47
entires describing translation format provided by the configurationmodel_type
- (required) thelangproviders
module and optional instance class to be instantiated;local
,remote
,remote.DeeplTranslator
etc.model_name
- (conditional) the model name loaded for translation. This field is required forlocal
translatormodel_type
(Optional) Model specific parameters defined by the translator model type may exist.
Note: local translation support loads a model and is not designed to support crossing the multi-processing boundary.
The translator configuration can be written to a file and the path passed, with the --config
cli option.
An example template is provided below.
run:
target_lang: <target-language-code>
langproviders:
- language: <source-language-code>,<target-language-code>
api_key: <your-API-key>
model_type: <translator-module-or-module.classname>
model_name: <huggingface-model-name>
- language: <target-language-code>,<source-language-code>
api_key: <your-API-key>
model_type: <translator-module-or-module.classname>
model_name: <huggingface-model-name>
Note: each translator is configured for a single translation pair and specification is required in each direction for a run to proceed.
Examples for translation configuration
DeepL
To use DeepL translation in garak, run the following command: You use the following yaml config.
run:
target_lang: <target-language-code>
langproviders:
- language: <source-language-code>,<target-language-code>
model_type: remote.DeeplTranslator
- language: <target-language-code>,<source-language-code>
model_type: remote.DeeplTranslator
export DEEPL_API_KEY=xxxx
python3 -m garak --model_type nim --model_name meta/llama-3.1-8b-instruct --probes encoding --config {path to your yaml config file}
Riva
For Riva, run the following command: You use the following yaml config.
run:
target_lang: <target-language-code>
langproviders:
- language: <source-language-code>,<target-language-code>
model_type: remote
- language: <target-language-code>,<source-language-code>
model_type: remote
export RIVA_API_KEY=xxxx
python3 -m garak --model_type nim --model_name meta/llama-3.1-8b-instruct --probes encoding --config {path to your yaml config file}
Local
For local translation, use the following command: You use the following yaml config.
run:
target_lang: jap
langproviders:
- language: en,jap
model_type: local
- language: jap,en
model_type: local
python3 -m garak --model_type nim --model_name meta/llama-3.1-8b-instruct --probes encoding --config {path to your yaml config file}
The default configuration will load Helsinki-NLP MarianMT models for local translation.
Additional support for Huggingface M2M100Model
type only is enabled by providing model_name
for local translators. The model name provided must
contain m2m100
to be loaded by garak.
run:
target_lang: ja
langproviders:
- language: en,ja
model_type: local
model_name: facebook/m2m100_418M
- language: jap,en
model_type: local
model_name: facebook/m2m100_418M
python3 -m garak --model_type nim --model_name meta/llama-3.1-8b-instruct --probes encoding --config {path to your yaml config file}