GPU Access

Covalent Cloud provides access to a variety of GPUs, as tabulated below.

Executing tasks on GPU resources requires assigning a GPU-equipped cloud executors to the tasks in question. Cloud executors specify a modular set of resources resources like vCPUs, GPUs, memory, and storage, as well as the software environment (i.e. Python version, Python packages, and any other libraries).

Here’s an example of a cloud executor that specifies 4x H100 GPUs.

import covalent_cloud as cc

gpu_executor = cc.CloudExecutor(
    gpu_type="h100",
    num_gpus=4,
    num_cpus=12,
    memory="16GB",
    env="huggingface-training" 
)

@ct.electron(executor=gpu_executor)
def train_model(model_id, data, parameters):
    # Your model training code here
    # ... 

GPU Types

The following types of GPUs are currently supported in Covalent Cloud. Note that memory refers to normal RAM, whereas vRAM refers to a GPU’s internal memory.

gpu_type	GPU Type	vRAM per GPU	Max num_cpus	num_gpus	Max memory
"h100"	H100 80GB	80 GB	252	1,2,4,8	1440 GB
"a100-80g"	A100 80GB	80 GB	252	1,2,4,8	960 GB
"v100"	V100	16 GB	96	1,4,8	825 GB
"l40"	L40	48 GB	252	1,2,4,8	480 GB
"a10"	A10G	24 GB	192	1,4,8	825 GB
"a6000"	RTX A6000	48 GB	128	1,2,4,8	480 GB
"a4000"	RTX A4000	16 GB	64	1,2,4,8,10	240 GB
"a5000"	RTX A5000	24 GB	64	1,2,4,8	240 GB
"t4"	T4	16 GB	96	1,4,8	412 GB

Each GPU type is priced differently. See here for up-to-date GPU pricing.

Cloud executor parameters

Each parameter in a CloudExecutor instance specifies a relevant resource; whether it’s hardware, memory, or time. With the exception of gpu_type, the value of each parameter reflects the amount of each resource that will be available to an electron that’s assigned a given executor.

name	type	default value	interpretation of default value
num_cpus	int	1	task execution uses 1 vCPU
memory	int or str	1024	task execution uses 1024 MB of RAM
num_gpus	int	0	task execution uses no GPUs
gpu_type	str	''	GPU type not specified (necessary when num_gpus > 0)
env	str	'default'	task executes in the user’s default software environment
time_limit	int, str, or timedelta	1800	task execution will be cancelled after 30 minutes

GPU Types​

Cloud executor parameters​

GPU Types

Cloud executor parameters