garak.generators.nvcf

This garak generator is a connector to NVIDIA Cloud Functions. It permits fast and flexible generation.

NVCF functions work by sending a request to an invocation endpoint, and then polling a status endpoint until the response is received. The cloud function is described using a UUID, which is passed to garak as the target_name. API key should be placed in environment variable NVCF_API_KEY or set in a garak config. For example:

export NVCF_API_KEY="example-api-key-xyz"
garak -t nvcf -n 341da0d0-aa68-4c4f-89b5-fc39286de6a1

Configuration

Configurable values:

temperature - Temperature for generation. Passed as a value to the endpoint.
top_p - Number of tokens to sample. Passed as a value to the endpoint.
invoke_uri_base - Base URL for the NVCF endpoint (default is for NVIDIA-hosted functions).
status_uri_base - URL to check for request status updates (default is for NVIDIA-hosted functions).
timeout - Read timeout for HTTP requests (note, this is network timeout, distinct from inference timeout)
version_id - API version id, postpended to endpoint URLs if supplied
stop_on_404 - Give up on endpoints returning 404 (i.e. nonexistent ones)
extra_params - Dictionary of optional extra values to pass to the endpoint. Default {"stream": False}.

Some NVCF instances require custom parameters, for example a “model” header. These can be asserted in the NVCF config. For example, this cURL maps to the following garak YAML:

curl -s -X POST 'https://api.nvcf.nvidia.com/v2/nvcf/pexec/functions/341da0d0-aa68-4c4f-89b5-fc39286de6a1' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer example-api-key-xyz' \
-d '{
      "messages": [{"role": "user", "content": "How many letters are in the word strawberry?"}],
      "model": "prefix/obsidianorder/terer-nor",
      "max_tokens": 1024,
      "stream": false
   }'

---
plugins:
   generators:
      nvcf:
         NvcfChat:
            api_key: example-api-key-xyz
            max_tokens: 1024
            extra_params:
               stream: false
               model: prefix/obsidianorder/terer-nor
   target_type: nvcf.NvcfChat
   target_name: 341da0d0-aa68-4c4f-89b5-fc39286de6a1

The nvcf generator uses the standard garak generator mechanism for max_tokens, which is why this value is set at generator-level rather than as a key-value pair in extra_params.

Scaling

The NVCF generator supports parallelisation and it’s recommended to use this, invoking garak with --parallel_attempts set to a value higher than one. IF the NVCF times out due to insufficient capacity, garak will note this, backoff, and retry the request later.

garak -t nvcf -n 341da0d0-aa68-4c4f-89b5-fc39286de6a1 --parallel_attempts 32

Or, as yaml config:

---
system:
   parallel_attempts: 32
plugins:
   target_type: nvcf.NvcfChat
   target_name: 341da0d0-aa68-4c4f-89b5-fc39286de6a1

NVCF LLM interface

class NvcfChat(name=None, config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/stable/garak/_config.py'>)Source 

Bases: Generator

Wrapper for NVIDIA Cloud Functions Chat models via NGC. Expects NVCF_API_KEY environment variable.

Configurable parameters:

DEFAULT_PARAMS contents:

max_tokens = 150
temperature = 0.2
top_k = None
context_len = None
skip_seq_start = None
skip_seq_end = None
top_p = 0.7
status_uri_base = 'https://api.nvcf.nvidia.com/v2/nvcf/pexec/status/'
invoke_uri_base = 'https://api.nvcf.nvidia.com/v2/nvcf/pexec/functions/'
timeout = 60
version_id = None
stop_on_404 = True
extra_params = {'stream': False}

Default values are listed

See also Configuring garak for how to set these values.

Other attributes:

ENV_VAR = 'NVCF_API_KEY'

generator_family_name = 'NVCF'

supports_multiple_generations = False

class NvcfCompletion(name=None, config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/stable/garak/_config.py'>)Source 

Bases: NvcfChat

Wrapper for NVIDIA Cloud Functions Completion models via NGC. Expects NVCF_API_KEY environment variables.