GPU Access

Covalent Cloud provides access to a various GPUs. GPUs are utilized in covalent by assigning GPU-equipped Cloud Executors to tasks.

Cloud Executors specify a modular set of compute resources, together with a software environment (i.e. a Python version, Python packages, and any other libraries). Here’s an example of a Cloud Executor that specifies 4x H100 GPUs and 4x CPUs.

import covalent_cloud as cc

gpu_executor = cc.CloudExecutor(
    gpu_type="h100",
    num_gpus=4,
    num_cpus=4,
    memory="16GB",
    env="huggingface-training" 
)

@ct.electron(executor=gpu_executor)
def train_model(model_id, data, parameters):
    # Your model training code here
    # ... 

GPU types

GPU types are specified using a Cloud Executor's gpu_type parameter.

This parameter accepts either a member of the GPU_TYPE enum or a GPU name as a lowercase string. For example, executor_1 and executor_2 are equivalent in the following:

import covalent_cloud as cc
from covalent_cloud.cloud_executor import GPU_TYPE

# using GPU_TYPE enum
executor_2 = cc.CloudExecutor(gpu_type=GPU_TYPE.H100, num_gpus=4)

# using name string
executor_1 = cc.CloudExecutor(gpu_type="h100", num_gpus=4)

A list of available GPU types is provided below.

GPU type	GPU name	vRAM per GPU
`H100`	`'h100'`	80 GB	details
`L40`	`'l40'`	48 GB	details
`A100`	`'a100-80g'`	80 GB	details
`A10`	`'a10'`	24 GB	details
`T4`	`'t4'`	16 GB	details
`A6000`	`'a6000'`	48 GB	details

See here for up-to-date pricing for each GPU type.

Cloud executor parameters

Each CloudExecutor parameter specifies a compute resource, except gpu_type and env.

parameter	type	default value	default value meaning
`num_cpus`	`int`	`1`	task execution uses 1 vCPU
`memory`	`int` or `str`	`1024`	task execution uses 1024 MB of RAM
`num_gpus`	`int`	`0`	task execution uses no GPUs
`gpu_type`	`str` or `GPU_TYPE`	`''`	GPU type not specified (necessary when `num_gpus` > 0)
`env`	`str`	`'default'`	task executes in the user’s default software environment
`time_limit`	`int`, `str`, or `timedelta`	`1800`	task execution will be cancelled after 30 minutes

Number of CPUs

The num_cpus parameter must correspond to a positive int that indicates the number of vCPUs to be make available to a task.

Memory

The memory parameter indicates the amount of RAM that a task can use. Integer values for this parameter are always interpreted as megabytes (MB). Memory can also be specified in units of GB or GiB (as well as MB) with a string value, e.g. memory="32GB". Note that maximum limits on memory vary with for each GPU type.

Number of GPUs

The num_gpus parameter indicates the desired number of GPUs. The number of GPUs can be (and is by default) 0 (Note the number of vCPUs must be at least 1). When an executor specifies one or more GPUs, the gpu_type must also be specified to indicate the type of GPU to use.

Environment

Software environments can be created in the Covalent Cloud UI or programmatically with cc.create_env(). See this guide for more on creating software environments. An executor’s env parameter must refer to an existing software environment in the user’s account. Executors initialized with an invalid env parameter will immediately raise an error by default.

Time limits

Specifying a time_limit on a Cloud Executor defines the maximum run time of a task. Overrunning the time limit generally results in exiting with an error. Time limits are intended to be used as a “safety mechanism” to prevent idle or hanging tasks from accruing costs.

GPU details

This section tabulates valid ranges of executor parameters for each available GPU type.

NVIDIA H100 Tensor Core GPU

H100 datasheet

`num_gpus`	max `num_cpus`	max `memory`
1	28	180 GB
2	60	360 GB
4	124	720 GB
8	252	1440 GB

NVIDIA L40 GPU

L40 datasheet

`num_gpus`	max `num_cpus`	max `memory`
1	28	58 GB
2	60	116 GB
4	124	232 GB
8	252	464 GB

NVIDIA A100 Tensor Core GPU

A100 datasheet

`num_gpus`	max `num_cpus`	max `memory`
1	28	120 GB
2	60	240 GB
4	124	480 GB
8	252	960 GB

NVIDIA A10G Tensor Core GPU

A10 datasheet

`num_gpus`	max `num_cpus`	max `memory`
1	48	103 GB
4	192	412 GB
8	192	768 GB

NVIDIA T4 Tensor Core GPU

T4 datasheet

`num_gpus`	max `num_cpus`	max `memory`
1	4	16 GB
4	48	192 GB
8	192	768 GB

NVIDIA RTX A6000 Graphics Card

A6000 datasheet

`num_gpus`	max `num_cpus`	max `memory`
1	28	58 GB
2	60	116 GB
4	124	232 GB
8	252	464 GB

GPU types​

Cloud executor parameters​

Number of CPUs​

Memory​

Number of GPUs​

Environment​

Time limits​

GPU details​

NVIDIA H100 Tensor Core GPU​

NVIDIA L40 GPU​

NVIDIA A100 Tensor Core GPU​

NVIDIA A10G Tensor Core GPU​

NVIDIA T4 Tensor Core GPU​

NVIDIA RTX A6000 Graphics Card​