garak.generators.nvcf
This garak generator is a connector to NVIDIA Cloud Functions. It permits fast and flexible generation.
NVCF functions work by sending a request to an invocation endpoint, and then polling
a status endpoint until the response is received. The cloud function is described
using a UUID, which is passed to garak as the target_name. API key should be placed in
environment variable NVCF_API_KEY or set in a garak config. For example:
export NVCF_API_KEY="example-api-key-xyz"
garak -t nvcf -n 341da0d0-aa68-4c4f-89b5-fc39286de6a1
Configuration
Configurable values:
temperature - Temperature for generation. Passed as a value to the endpoint.
top_p - Number of tokens to sample. Passed as a value to the endpoint.
invoke_uri_base - Base URL for the NVCF endpoint (default is for NVIDIA-hosted functions).
status_uri_base - URL to check for request status updates (default is for NVIDIA-hosted functions).
timeout - Read timeout for HTTP requests (note, this is network timeout, distinct from inference timeout)
version_id - API version id, postpended to endpoint URLs if supplied
stop_on_404 - Give up on endpoints returning 404 (i.e. nonexistent ones)
extra_params - Dictionary of optional extra values to pass to the endpoint. Default
{"stream": False}.
Some NVCF instances require custom parameters, for example a “model” header. These can be asserted in the NVCF config. For example, this cURL maps to the following garak YAML:
curl -s -X POST 'https://api.nvcf.nvidia.com/v2/nvcf/pexec/functions/341da0d0-aa68-4c4f-89b5-fc39286de6a1' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer example-api-key-xyz' \
-d '{
"messages": [{"role": "user", "content": "How many letters are in the word strawberry?"}],
"model": "prefix/obsidianorder/terer-nor",
"max_tokens": 1024,
"stream": false
}'
---
plugins:
generators:
nvcf:
NvcfChat:
api_key: example-api-key-xyz
max_tokens: 1024
extra_params:
stream: false
model: prefix/obsidianorder/terer-nor
target_type: nvcf.NvcfChat
target_name: 341da0d0-aa68-4c4f-89b5-fc39286de6a1
The nvcf generator uses the standard garak generator mechanism for
max_tokens, which is why this value is set at generator-level rather than
as a key-value pair in extra_params.
Scaling
The NVCF generator supports parallelisation and it’s recommended to use this,
invoking garak with --parallel_attempts set to a value higher than one.
IF the NVCF times out due to insufficient capacity, garak will note this,
backoff, and retry the request later.
garak -t nvcf -n 341da0d0-aa68-4c4f-89b5-fc39286de6a1 --parallel_attempts 32
Or, as yaml config:
---
system:
parallel_attempts: 32
plugins:
target_type: nvcf.NvcfChat
target_name: 341da0d0-aa68-4c4f-89b5-fc39286de6a1
NVCF LLM interface
- class NvcfChat(name=None, config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/stable/garak/_config.py'>)Source
Bases:
GeneratorWrapper for NVIDIA Cloud Functions Chat models via NGC. Expects NVCF_API_KEY environment variable.
Configurable parameters:
DEFAULT_PARAMScontents:max_tokens=150temperature=0.2top_k=Nonecontext_len=Noneskip_seq_start=Noneskip_seq_end=Nonetop_p=0.7status_uri_base='https://api.nvcf.nvidia.com/v2/nvcf/pexec/status/'invoke_uri_base='https://api.nvcf.nvidia.com/v2/nvcf/pexec/functions/'timeout=60version_id=Nonestop_on_404=Trueextra_params={'stream': False}
Default values are listed
See also Configuring garak for how to set these values.
Other attributes:
- ENV_VAR = 'NVCF_API_KEY'
- generator_family_name = 'NVCF'
- supports_multiple_generations = False