Skip to main content

Defining Compute Resources

Covalent Cloud makes defining compute resources as easy as possible! As always in Covalent, we can of course do this on a task-by-task basis.

Defining compute resources starts with creating Cloud Executors to encapsulate any desired task execution parameters. This includes compute resources (i.e. vCPUs, GPUs, memory, storage) as well as the software environment (i.e. Python version, Python packages, and any other libraries).

Cloud Executors

The default cloud executor uses the following parameters.

parametertypedefault valueinterpretation of default value
num_cpusint1task execution uses 1 vCPU
memoryint or str1024task execution uses 1024 MB of RAM
num_gpusint0task execution uses no GPUs
gpu_typestr''GPU type not specified (necessary when num_gpus > 0)
envstr'defalut'task executes in the user’s default software environment
time_limitint, str, or timedelta1800task execution will be cancelled after 30 minutes

To use the default parameters, instantiate a CloudExecutor without any arguments.

import covalent as ct
import covalent_cloud as cc

default_exec = cc.CloudExecutor() # 1 vCPU, 1024 MB memory, 0 GPUs, ...

@ct.electron(executor=default_exec)
def lightweight_task(*inputs):
# Execute using 1 vCPU, 1024 MB memory, 0 GPUs, ...
...

With this simple setup, you're all set to run tasks with the specified resources on the cloud.

Software Environments

Tip

Environments can be created in the Covalent Cloud UI or programmatically with cc.create_env(). See this guide for more on creating software environments.

The executor’s env parameter must refer to an existing software environment in the user’s account. Tasks that use an executors with an invalid env parameter will terminate with an error.

The name of any existing environment is a valid value for this parameter. It is also acceptable to use the same environment in multiple Cloud Executors.

torch_exec = cc.CloudExecutor(
env="pytorch-testing",
num_cpus=4,
)

@ct.electron(executor=torch_exec)
def data_processing(*inputs):
# Execute using 4 vCPU, 1024 MB memory, 0 GPUS, ...
...

torch_exec_A100 = cc.CloudExecutor(
env="pytorch-testing",
num_cpus=4,
num_gpus=2,
gpu_type="a100",
memory="12GB",
time_limit="in 3 hours",
)

@ct.electron(executor=torch_exec_A100)
def train_model(*parameters):
# Execute using 4 vCPU, 12 GB memory, 2xV100 GPUs, ...
...

CPUs, GPUs, and memory

The num_cpus parameter must correspond to a positive int that indicates the number of vCPUs to be make available to a task. The num_gpus parameter similarly indicates the number of GPUs to make available. The number of GPUs can be (and is by default) 0, whereas the number of vCPUs must be at least 1. When an executor specifies one or more GPUs, the gpu_type indicates the type of GPU to use.

The memory parameter indicates the amount of RAM that a task can use. Integer values for this parameter are always interpreted as megabytes. Memory can also be specified in units of GB or GiB (as well as MB) with a string value, e.g. memory="32GB".

Note that limits on memory will vary with the GPU type.

gpu_typeGPU TypeMax num_cpusnum_gpusMax memory (GB)supports cc.volume
"h100"H100 80GB2521,2,4,81440no
"a100-80g"A100 80GB2521,2,4,8960no
"v100"V100961,4,8825yes
"l40"L402521,2,4,8480no
"a10"A10G1921,4,8825yes
"a6000"RTX A60001281,2,4,8480no
"a4000"RTX A4000641,2,4,8,10240no
"a5000"RTX A5000641,2,4,8240no
"t4"T4961,4,8412yes

For up-to-date pricing on these please check here

Time limits

Specifying a time_limit on a Cloud Executor defines the maximum amount of time that a task can take before being terminated. This generally results in the task exiting with an error. The time limit is intended as a “safety mechanism” to prevent idle or hanging tasks from accruing costs.