garak.generators.nvcf

This garak generator is a connector to NVIDIA Cloud Functions. It permits fast and flexible generation.

NVCF functions work by sending a request to an invocation endpoint, and then polling a status endpoint until the response is received. The cloud function is described using a UUID, which is passed to garak as the model_name. API key should be placed in environment variable NVCF_API_KEY or set in a garak config. For example:

export NVCF_API_KEY="example-api-key-xyz"
garak -m nvcf -n 341da0d0-aa68-4c4f-89b5-fc39286de6a1

Configuration

Configurable values:

  • temperature - Temperature for generation. Passed as a value to the endpoint.

  • top_p - Number of tokens to sample. Passed as a value to the endpoint.

  • invoke_uri_base - Base URL for the NVCF endpoint (default is for NVIDIA-hosted functions).

  • status_uri_base - URL to check for request status updates (default is for NVIDIA-hosted functions).

  • timeout - Read timeout for HTTP requests (note, this is network timeout, distinct from inference timeout)

  • version_id - API version id, postpended to endpoint URLs if supplied

  • stop_on_404 - Give up on endpoints returning 404 (i.e. nonexistent ones)

  • extra_params - Dictionary of optional extra values to pass to the endpoint. Default {"stream": False}.

Some NVCF instances require custom parameters, for example a “model” header. These can be asserted in the NVCF config. For example, this cURL maps to the following garak YAML:

curl -s -X POST 'https://api.nvcf.nvidia.com/v2/nvcf/pexec/functions/341da0d0-aa68-4c4f-89b5-fc39286de6a1' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer example-api-key-xyz' \
-d '{
      "messages": [{"role": "user", "content": "How many letters are in the word strawberry?"}],
      "model": "prefix/obsidianorder/terer-nor",
      "max_tokens": 1024,
      "stream": false
   }'
---
plugins:
   generators:
      nvcf:
         NvcfChat:
            api_key: example-api-key-xyz
            max_tokens: 1024
            extra_params:
               stream: false
               model: prefix/obsidianorder/terer-nor
   model_type: nvcf.NvcfChat
   model_name: 341da0d0-aa68-4c4f-89b5-fc39286de6a1

The nvcf generator uses the standard garak generator mechanism for max_tokens, which is why this value is set at generator-level rather than as a key-value pair in extra_params.

Scaling

The NVCF generator supports parallelisation and it’s recommended to use this, invoking garak with --parallel_attempts set to a value higher than one. IF the NVCF times out due to insufficient capacity, garak will note this, backoff, and retry the request later.

garak -m nvcf -n 341da0d0-aa68-4c4f-89b5-fc39286de6a1 --parallel_attempts 32

Or, as yaml config:

---
system:
   parallel_attempts: 32
plugins:
   model_type: nvcf.NvcfChat
   model_name: 341da0d0-aa68-4c4f-89b5-fc39286de6a1

NVCF LLM interface

class garak.generators.nvcf.NvcfChat(name=None, config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: Generator

Wrapper for NVIDIA Cloud Functions Chat models via NGC. Expects NVCF_API_KEY environment variable.

DEFAULT_PARAMS = {'context_len': None, 'extra_params': {'stream': False}, 'invoke_uri_base': 'https://api.nvcf.nvidia.com/v2/nvcf/pexec/functions/', 'max_tokens': 150, 'status_uri_base': 'https://api.nvcf.nvidia.com/v2/nvcf/pexec/status/', 'stop_on_404': True, 'temperature': 0.2, 'timeout': 60, 'top_k': None, 'top_p': 0.7, 'version_id': None}
ENV_VAR = 'NVCF_API_KEY'
generator_family_name = 'NVCF'
supports_multiple_generations = False
class garak.generators.nvcf.NvcfCompletion(name=None, config_root=<module 'garak._config' from '/home/docs/checkouts/readthedocs.org/user_builds/garak/checkouts/latest/docs/source/../../garak/_config.py'>)

Bases: NvcfChat

Wrapper for NVIDIA Cloud Functions Completion models via NGC. Expects NVCF_API_KEY environment variables.