Deploying Standalone Function Services

In addition to dispatching workflows, Covalent Cloud can deploy long-running services with custom APIs for making HTTP requests. All the advanced compute hardware that’s available to workflow tasks is also available to function services.

Unlike normal workflow tasks, however, function services always stay running until their time limit expires, or until the user decides to tear them down.

This page covers the basics of deploying standalone function services. See Deploying Services from Workflows for an alternate approach. An introduction to interacting with deployed services can be found here.

The Basics

import covalent_cloud as cc

Compute resources

The following executor represents the minimum recommended resources for a function service.

service_executor = cc.CloudExecutor(env="my-env", num_cpus=4, memory="12GB")

This service_executor is used throughout this introductory section.

For high-compute applications like hosting large language models (LLMs), GPU resources are usually more appropriate. This corresponds to an executor more like the following.

h100_executor = cc.CloudExecutor(
    env="my-transformers-env",
    memory="56GB",
    num_cpus=4,
    num_gpus=1,
    gpu_type="h100",
    time_limit="30 minutes"
)

Note

👉 See here for a full-scale example on deploying a Llama 3 service for inference.

Defining the service initializer

Service creation centres on using the @cc.service decorator to wrap a special initializer function. This function runs only once, usually to load objects into memory for later use.

@cc.service(executor=service_executor)
def example_service(message="Hello, world!"):
    items = (1, 5, "foo")

    return {
        "message": message,
        "items": items
    }

Tip

Service initializers must always return a dictionary. Values in this dictionary are automatically substituted for any endpoint arguments that match its keys.

Attaching endpoints

Attaching endpoints to function services defines a custom API for making requests.

@example_service.endpoint("/hello")
def hello(message, number=42):
    return " ".join([message, f"My favorite number is {number}"])

In attached endpoints like this one, both the message and number parameters are populated by default, with the former supplied by the initializer’s return value.

The /hello endpoint therefore responds with the following by default:

"Hello, world! My favorite number is 42"

Note

The return values of endpoint functions must be JSON-serializable.

Attaching streaming endpoints

Attaching streaming endpoints requires setting stream=True inside the endpoint decorator.

import time

@example_service.endpoint("/stream", streaming=True)
def simulate_stream(message, items):
    """Dummy test endpoint for streaming"""

    # Stream message letter by letter
    for letter in message:
        yield letter
        time.sleep(0.1)  # slow it down for demonstration

    yield "\n"

    # Stream items one by one
    yield "Items: "
    for item in items:
        yield str(item) + " "
        time.sleep(0.1)

    yield "\nDone streaming"

Tip

Streaming endpoint functions must always yield responses instead of returning.

Again, unless the message and items parameters here are overridden, the /stream responds with

Hello, world!
Items: 1 5 foo
Done streaming

See Interacting with Service Endpoints for more information on live-streaming endpoint responses.

Deployment

Deploying function services means hosting them on Covalent Cloud backends and obtaining a unique URL for making HTTP requests. Deployment happens via the Covalent Cloud SDK, using either the cc.deploy() function or using an in-workflow deployment.

The cc.deploy() function is designed for standalone deployments (outside Covalent workflows). It returns an object that serves as a Python client for interacting with the service.

client = cc.deploy(example_service)()
client = cc.get_deployment(client, wait=True)  # wait for ACTIVE state
print(client)

╭──────────────────────── Deployment Information ────────────────────────╮
│  Name          example_service                                         │
│  Description   "Service description goes here                          │
│  Function ID   663accb6f7d37dbf2a4687b7                                │
│  Address       https://fn.prod.covalent.xyz/0663accb6f7d37dbf2a4687b7  │
│  Status        ACTIVE                                                  │
│  Tags                                                                  │
│  Auth Enabled  Yes                                                     │
╰────────────────────────────────────────────────────────────────────────╯
╭────────────────────────────────────╮
│            POST /hello             │
│  Streaming    No                   │
│  Description  Dummy test endpoint  │
╰────────────────────────────────────╯
╭──────────────────────────────────────────────────╮
│                   POST /stream                   │
│  Streaming    Yes                                │
│  Description  Dummy test endpoint for streaming  │
╰──────────────────────────────────────────────────╯
Authorization token: <authorization-token>

Authorization tokens

A single authorization token is generated by default for every function service deployment. It is possible (though not generally recommended) to disable this feature by setting auth=False inside @cc.service.

Note

A service with an auth=False setting is accessible to anyone who knows the address.

Management

When enabled, authorization tokens are useful for granting third-party access to private function services. These tokens do not grant teardown privileges, which is an action reserved for the owner of the Covalent Cloud API key. Other administrative actions, such as deleting or adding authorization tokens, are only possible through the Covalent Cloud UI.

Teardown

Teardown is the opposite of deployment—it deletes the function service and frees up the associated compute resources. Teardown can be initiated from the Covalent Cloud UI or using the Python client’s .teardown() method.

client.teardown()

'Teardown initiated asynchronously.'

Note that teardown happens automatically after the service executor’s time_limit expires.

Information Endpoint

Every function service includes a built-in POST /info endpoint and every service client includes a corresponding .info() method. This endpoint returns a JSON response containing information about the function service, as summarized in the following fields:

Key	Description
`start_date`	Datetime string of the service’s start time
`up_time`	Time elapsed since the service started
`init_args`	String representation of the service initializer's positional arguments
`init_kwargs`	String representation of the service initializer's keyword arguments
`api`	An API specification

Format

API specifications are provided in the format below, where name and description are the service’s name and docstring, respectively.

The paths field describes endpoints available on the function service. A service with only a single /generate endpoint, for example, might have an api field that looks something like this:

{
    "name": "My Service",
    "description": "Service docstring appears here.",
    "paths": {
        "/generate": {
            "method": "POST",
            "streaming": false,
            "description": "Endpoint docstring appears here.",
            "parameters": {
                "model": {
                    "type": "LlamaForCausalLM",
                    "default": "LlamaForCausalLM(\n  (model): [etc.]",
                    "auto": true
                },
                "age": {
                    "type": "int",
                    "default": 30,
                    "auto": false
                }
            },
            "var_kwargs": false,
            "return_type": "Dict[str, Any]"
        }
    }
}

Endpoint parameters for each endpoint path are described in terms of their Python type, default value, and the auto field, which indicates whether the parameter is automatically populated by the service initializer.

Any non-serializable values for parameter default fields are replaced by their string representation, truncated to 500 characters. (Samples here are shortened even further for neatness.)

API information inside endpoints

For more advanced applications, the content of the API specification is optionally available as a Python dict to any endpoint function that includes an _api argument in its signature. Since only the owner of a function service an access the /info endpoint, the _api argument provides a means of exposing custom endpoints that can utilize and/or communicate this information to non-admin users (i.e. without requiring the owner's Covalent Cloud API key) or other services.

Below is an an example such an endpoint which simply returns a copy of the API spec:

@my_service.endpoint("/get-api")
def access_spec(_api) -> dict:
    """Get the API spec for the service."""
    d = _api.copy()
    return d

Deploying Standalone Function Services

The Basics​

Compute resources​

Defining the service initializer​

Attaching endpoints​

Attaching streaming endpoints​

Deployment​

Authorization tokens​

Management​

Teardown​

Information Endpoint​

Format​

API information inside endpoints​

The Basics

Compute resources

Defining the service initializer

Attaching endpoints

Attaching streaming endpoints

Deployment

Authorization tokens

Management

Teardown

Information Endpoint

Format

API information inside endpoints