Skip to main content

Deploying Standalone Function Services

In addition to dispatching workflows, Covalent Cloud can deploy long-running services with custom APIs for making HTTP requests. All the advanced compute hardware that’s available to workflow tasks is also available to function services.

Unlike normal workflow tasks, however, function services always stay running until their time limit expires, or until the user decides to tear them down.

This page covers the basics of deploying standalone function services. See Deploying Services from Workflows for an alternate approach. An introduction to interacting with deployed services can be found here.

The Basics

import covalent_cloud as cc

Compute resources

The following executor represents the minimum recommended resources for a function service.

service_executor = cc.CloudExecutor(env="my-env", num_cpus=6, memory="12GB")

This service_executor is used throughout this introductory section.

For high-compute applications like hosting large language models (LLMs), GPU resources are usually more appropriate. This correspond to an executor more like the following.

h100_executor = cc.CloudExecutor(
env="my-transformers-env",
memory="56GB",
num_cpus=8,
num_gpus=1,
gpu_type="h100",
time_limit="30 minutes"
)

Note

👉 See here for a full-scale example on deploying a Llama3 service for inference.

Defining the service initializer

Service creation centres on using the @cc.service decorator to wrap a special initializer function. This function runs only once, usually to load objects into memory for later re-use.

@cc.service(executor=service_executor)
def example_service(message="Hello, world!"):
items = (1, 5, "foo")

return {
"message": message,
"items": items
}

Tip

Service initializers must always return a dictionary. Values in this dictionary are automatically substituted for any endpoint arguments that match its keys.

Attaching endpoints

Attaching endpoints to function services defines a custom API for making requests.

@example_service.endpoint("/hello")
def hello(message, number=42):
return " ".join([message, f"My favorite number is {number}"]

In attached endpoints like this one, both the message and number parameters are populated by default, thanks in part to the initializer’s return value.

The /hello endpoint above therefore responds with the following, unless those parameters are overridden.

"Hello, world! My favorite number is 42"

Note

The return values of endpoint functions must be JSON-serializable.

Attaching streaming endpoints

Attaching streaming endpoints requires setting stream=True inside the endpoint decorator.

import time

@example_service.endpoint("/stream", streaming=True)
def simulate_stream(message, items):
"""Dummy test endpoint for streaming"""

# Stream message letter by letter
for letter in message:
yield letter
time.sleep(0.1) # slow it down for demonstration

yield "\n"

# Stream items one by one
yield "Items: "
for item in items:
yield str(item) + " "
time.sleep(0.1)

yield "\nDone streaming"

Tip

Streaming endpoint functions must always yield responses instead of returning.

Again, unless the message and items parameters here are overridden, the /stream responds with

Hello, world!
Items: 1 5 foo
Done streaming

See Interacting with Service Endpoints for more information on live-streaming endpoint responses.

Deployment

Deploying function services means hosting them on Covalent Cloud backends and obtaining a unique URL for making HTTP requests. Deployment happens via the Covalent Cloud SDK, using either the cc.deploy() function or using an in-workflow deployment.

The cc.deploy() function is designed for standalone deployments (outside Covalent workflows). It returns an object that serves as a Python client for interacting with the service.

client = cc.deploy(example_service)()
client = cc.get_deployment(client, wait=True) # wait for ACTIVE state
print(client)
╭──────────────────────── Deployment Information ────────────────────────╮
│ Name example_service │
│ Description "Service description goes here │
│ Function ID 663accb6f7d37dbf2a4687b7 │
│ Address https://fn.prod.covalent.xyz/0663accb6f7d37dbf2a4687b7 │
│ Status ACTIVE │
│ Tags │
│ Auth Enabled Yes │
╰────────────────────────────────────────────────────────────────────────╯
╭────────────────────────────────────╮
│ POST /hello │
│ Streaming No │
│ Description Dummy test endpoint │
╰────────────────────────────────────╯
╭──────────────────────────────────────────────────╮
│ POST /stream │
│ Streaming Yes │
│ Description Dummy test endpoint for streaming │
╰──────────────────────────────────────────────────╯
Authorization token: <authorization-token>

Authorization tokens

A single authorization token is generated by default for every function service deployment. It is possible (though not generally recommended) to disable this feature by setting auth=False inside @cc.service. Services with an auth=False setting are accessible to anyone who has the address.

Management

When enabled, authorization tokens are useful for granting third-party access to private function services. These tokens do not grant teardown privileges, which is an action reserved for the owner of the Covalent Cloud API key. Other administrative actions, such as deleting or adding authorization tokens, are only possible through the Covalent Cloud UI.

Teardown

Teardown is the opposite of deployment—it deletes the function service and frees up the associated compute resources. Teardown can be initiated from the Covalent Cloud UI or using the Python client’s teardown() method.

client.teardown()
'Teardown initiated asynchronously.'

Note that teardown happens automatically after the service executor’s time_limit expires.