Skip to main content

GPU Access

Covalent Cloud provides access to a various GPUs. GPUs are utilized in covalent by assigning GPU-equipped Cloud Executors to tasks.

Cloud Executors specify a modular set of compute resources, together with a software environment (i.e. a Python version, Python packages, and any other libraries). Here’s an example of a Cloud Executor that specifies 4x H100 GPUs and 4x CPUs.

import covalent_cloud as cc

gpu_executor = cc.CloudExecutor(
gpu_type="h100",
num_gpus=4,
num_cpus=4,
memory="16GB",
env="huggingface-training"
)

@ct.electron(executor=gpu_executor)
def train_model(model_id, data, parameters):
# Your model training code here
# ...

GPU types

GPU types are specified using a Cloud Executor's gpu_type parameter.

This parameter accepts either a member of the GPU_TYPE enum or a GPU name as a lowercase string. For example, executor_1 and executor_2 are equivalent in the following:

import covalent_cloud as cc
from covalent_cloud.cloud_executor import GPU_TYPE

# using GPU_TYPE enum
executor_2 = cc.CloudExecutor(gpu_type=GPU_TYPE.H100, num_gpus=4)

# using name string
executor_1 = cc.CloudExecutor(gpu_type="h100", num_gpus=4)

A list of available GPU types is provided below.

GPU typeGPU namevRAM per GPU
H100'h100'80 GBdetails
L40'l40'48 GBdetails
A100'a100-80g'80 GBdetails
A10'a10'24 GBdetails
T4't4'16 GBdetails
A6000'a6000'48 GBdetails

See here for up-to-date pricing for each GPU type.

Cloud executor parameters

Each CloudExecutor parameter specifies a compute resource, except gpu_type and env.

parametertypedefault valuedefault value meaning
num_cpusint1task execution uses 1 vCPU
memoryint or str1024task execution uses 1024 MB of RAM
num_gpusint0task execution uses no GPUs
gpu_typestr or GPU_TYPE''GPU type not specified (necessary when num_gpus > 0)
envstr'default'task executes in the user’s default software environment
time_limitint, str, or timedelta1800task execution will be cancelled after 30 minutes

Number of CPUs

The num_cpus parameter must correspond to a positive int that indicates the number of vCPUs to be make available to a task.

Memory

The memory parameter indicates the amount of RAM that a task can use. Integer values for this parameter are always interpreted as megabytes (MB). Memory can also be specified in units of GB or GiB (as well as MB) with a string value, e.g. memory="32GB". Note that maximum limits on memory vary with for each GPU type.

Number of GPUs

The num_gpus parameter indicates the desired number of GPUs. The number of GPUs can be (and is by default) 0 (Note the number of vCPUs must be at least 1). When an executor specifies one or more GPUs, the gpu_type must also be specified to indicate the type of GPU to use.

Environment

Software environments can be created in the Covalent Cloud UI or programmatically with cc.create_env(). See this guide for more on creating software environments. An executor’s env parameter must refer to an existing software environment in the user’s account. Executors initialized with an invalid env parameter will immediately raise an error by default.

Time limits

Specifying a time_limit on a Cloud Executor defines the maximum run time of a task. Overrunning the time limit generally results in exiting with an error. Time limits are intended to be used as a “safety mechanism” to prevent idle or hanging tasks from accruing costs.

GPU details

This section tabulates valid ranges of executor parameters for each available GPU type.

NVIDIA H100 Tensor Core GPU

num_gpusmax num_cpusmax memory
128180 GB
260360 GB
4124720 GB
82521440 GB

NVIDIA L40 GPU

num_gpusmax num_cpusmax memory
12858 GB
260116 GB
4124232 GB
8252464 GB

NVIDIA A100 Tensor Core GPU

num_gpusmax num_cpusmax memory
128120 GB
260240 GB
4124480 GB
8252960 GB

NVIDIA A10G Tensor Core GPU

num_gpusmax num_cpusmax memory
148103 GB
4192412 GB
8192768 GB

NVIDIA T4 Tensor Core GPU

num_gpusmax num_cpusmax memory
1416 GB
448192 GB
8192768 GB

NVIDIA RTX A6000 Graphics Card

num_gpusmax num_cpusmax memory
12858 GB
260116 GB
4124232 GB
8252464 GB