Deploying Standalone Function Services
In addition to dispatching workflows, Covalent Cloud can deploy long-running services with custom APIs for making HTTP requests. All the advanced compute hardware that’s available to workflow tasks is also available to function services.
Unlike normal workflow tasks, however, function services always stay running until their time limit expires, or until the user decides to tear them down.
This page covers the basics of deploying standalone function services. See Deploying Services from Workflows for an alternate approach. An introduction to interacting with deployed services can be found here.
The Basics
import covalent_cloud as cc
Compute resources
The following executor represents the minimum recommended resources for a function service.
service_executor = cc.CloudExecutor(env="my-env", num_cpus=4, memory="12GB")
This service_executor
is used throughout this introductory section.
For high-compute applications like hosting large language models (LLMs), GPU resources are usually more appropriate. This corresponds to an executor more like the following.
h100_executor = cc.CloudExecutor(
env="my-transformers-env",
memory="56GB",
num_cpus=4,
num_gpus=1,
gpu_type="h100",
time_limit="30 minutes"
)
Note
👉 See here for a full-scale example on deploying a Llama 3 service for inference.
Defining the service initializer
Service creation centres on using the @cc.service
decorator to wrap a special initializer function. This function runs only once, usually to load objects into memory for later use.
@cc.service(executor=service_executor)
def example_service(message="Hello, world!"):
items = (1, 5, "foo")
return {
"message": message,
"items": items
}
Tip
Service initializers must always return a dictionary. Values in this dictionary are automatically substituted for any endpoint arguments that match its keys.
Attaching endpoints
Attaching endpoints to function services defines a custom API for making requests.
@example_service.endpoint("/hello")
def hello(message, number=42):
return " ".join([message, f"My favorite number is {number}"])
In attached endpoints like this one, both the message
and number
parameters are populated by default, with the former supplied by the initializer’s return value.
The /hello
endpoint therefore responds with the following by default:
"Hello, world! My favorite number is 42"
Note
The return values of endpoint functions must be JSON-serializable.
Attaching streaming endpoints
Attaching streaming endpoints requires setting stream=True
inside the endpoint decorator.
import time
@example_service.endpoint("/stream", streaming=True)
def simulate_stream(message, items):
"""Dummy test endpoint for streaming"""
# Stream message letter by letter
for letter in message:
yield letter
time.sleep(0.1) # slow it down for demonstration
yield "\n"
# Stream items one by one
yield "Items: "
for item in items:
yield str(item) + " "
time.sleep(0.1)
yield "\nDone streaming"
Tip
Streaming endpoint functions must always yield responses instead of returning.
Again, unless the message
and items
parameters here are overridden, the /stream
responds with
Hello, world!
Items: 1 5 foo
Done streaming
See Interacting with Service Endpoints for more information on live-streaming endpoint responses.
Deployment
Deploying function services means hosting them on Covalent Cloud backends and obtaining a unique URL for making HTTP requests. Deployment happens via the Covalent Cloud SDK, using either the cc.deploy()
function or using an in-workflow deployment.
The cc.deploy()
function is designed for standalone deployments (outside Covalent workflows). It returns an object that serves as a Python client for interacting with the service.
client = cc.deploy(example_service)()
client = cc.get_deployment(client, wait=True) # wait for ACTIVE state
print(client)
╭──────────────────────── Deployment Information ────────────────────────╮
│ Name example_service │
│ Description "Service description goes here │
│ Function ID 663accb6f7d37dbf2a4687b7 │
│ Address https://fn.prod.covalent.xyz/0663accb6f7d37dbf2a4687b7 │
│ Status ACTIVE │
│ Tags │
│ Auth Enabled Yes │
╰────────────────────────────────────────────────────────────────────────╯
╭────────────────────────────────────╮
│ POST /hello │
│ Streaming No │
│ Description Dummy test endpoint │
╰────────────────────────────────────╯
╭──────────────────────────────────────────────────╮
│ POST /stream │
│ Streaming Yes │
│ Description Dummy test endpoint for streaming │
╰──────────────────────────────────────────────────╯
Authorization token: <authorization-token>
Authorization tokens
A single authorization token is generated by default for every function service deployment. It is possible (though not generally recommended) to disable this feature by setting auth=False
inside @cc.service
.
Note
A service with an auth=False
setting is accessible to anyone who knows the address.
Management
When enabled, authorization tokens are useful for granting third-party access to private function services. These tokens do not grant teardown privileges, which is an action reserved for the owner of the Covalent Cloud API key. Other administrative actions, such as deleting or adding authorization tokens, are only possible through the Covalent Cloud UI.
Teardown
Teardown is the opposite of deployment—it deletes the function service and frees up the associated compute resources. Teardown can be initiated from the Covalent Cloud UI or using the Python client’s .teardown()
method.
client.teardown()
'Teardown initiated asynchronously.'
Note that teardown happens automatically after the service executor’s time_limit
expires.
Information Endpoint
Every function service includes a built-in POST /info
endpoint and every service client includes a corresponding .info()
method. This endpoint returns a JSON response containing information about the function service, as summarized in the following fields:
Key | Description |
---|---|
start_date | Datetime string of the service’s start time |
up_time | Time elapsed since the service started |
init_args | String representation of the service initializer's positional arguments |
init_kwargs | String representation of the service initializer's keyword arguments |
api | An API specification |
Format
API specifications are provided in the format below, where name
and description
are the service’s name and docstring, respectively.
The paths
field describes endpoints available on the function service. A service with only a single /generate
endpoint, for example, might have an api
field that looks something like this:
{
"name": "My Service",
"description": "Service docstring appears here.",
"paths": {
"/generate": {
"method": "POST",
"streaming": false,
"description": "Endpoint docstring appears here.",
"parameters": {
"model": {
"type": "LlamaForCausalLM",
"default": "LlamaForCausalLM(\n (model): [etc.]",
"auto": true
},
"age": {
"type": "int",
"default": 30,
"auto": false
}
},
"var_kwargs": false,
"return_type": "Dict[str, Any]"
}
}
}
Endpoint parameters for each endpoint path are described in terms of their Python type, default value, and the auto
field, which indicates whether the parameter is automatically populated by the service initializer.
Any non-serializable values for parameter default
fields are replaced by their string representation, truncated to 500 characters. (Samples here are shortened even further for neatness.)
API information inside endpoints
For more advanced applications, the content of the API specification is optionally available as a Python dict
to any endpoint function that includes an _api
argument in its signature. Since only the owner of a function service an access the /info
endpoint, the _api
argument provides a means of exposing custom endpoints that can utilize and/or communicate this information to non-admin users (i.e. without requiring the owner's Covalent Cloud API key) or other services.
Below is an an example such an endpoint which simply returns a copy of the API spec:
@my_service.endpoint("/get-api")
def access_spec(_api) -> dict:
"""Get the API spec for the service."""
d = _api.copy()
return d