GPU Access
Covalent Cloud provides access to a variety of GPUs, as tabulated below.
Executing tasks on GPU resources requires assigning a GPU-equipped cloud executors to the tasks in question. Cloud executors specify a modular set of resources resources like vCPUs, GPUs, memory, and storage, as well as the software environment (i.e. Python version, Python packages, and any other libraries).
Here’s an example of a cloud executor that specifies 4x H100 GPUs.
import covalent_cloud as cc
gpu_executor = cc.CloudExecutor(
gpu_type="h100",
num_gpus=4,
num_cpus=12,
memory="16GB",
env="huggingface-training"
)
@ct.electron(executor=gpu_executor)
def train_model(model_id, data, parameters):
# Your model training code here
# ...
GPU Types
The following types of GPUs are currently supported in Covalent Cloud. Note that memory
refers to normal RAM, whereas vRAM refers to a GPU’s internal memory.
gpu_type | GPU Type | vRAM per GPU | Max num_cpus | num_gpus | Max memory |
---|---|---|---|---|---|
"h100" | H100 80GB | 80 GB | 252 | 1,2,4,8 | 1440 GB |
"a100-80g" | A100 80GB | 80 GB | 252 | 1,2,4,8 | 960 GB |
"v100" | V100 | 16 GB | 96 | 1,4,8 | 825 GB |
"l40" | L40 | 48 GB | 252 | 1,2,4,8 | 480 GB |
"a10" | A10G | 24 GB | 192 | 1,4,8 | 825 GB |
"a6000" | RTX A6000 | 48 GB | 128 | 1,2,4,8 | 480 GB |
"a4000" | RTX A4000 | 16 GB | 64 | 1,2,4,8,10 | 240 GB |
"a5000" | RTX A5000 | 24 GB | 64 | 1,2,4,8 | 240 GB |
"t4" | T4 | 16 GB | 96 | 1,4,8 | 412 GB |
Each GPU type is priced differently. See here for up-to-date GPU pricing.
Cloud executor parameters
Each parameter in a CloudExecutor
instance specifies a relevant resource; whether it’s hardware, memory, or time. With the exception of gpu_type
, the value of each parameter reflects the amount of each resource that will be available to an electron that’s assigned a given executor.
name | type | default value | interpretation of default value |
---|---|---|---|
num_cpus | int | 1 | task execution uses 1 vCPU |
memory | int or str | 1024 | task execution uses 1024 MB of RAM |
num_gpus | int | 0 | task execution uses no GPUs |
gpu_type | str | '' | GPU type not specified (necessary when num_gpus > 0) |
env | str | 'default' | task executes in the user’s default software environment |
time_limit | int, str, or timedelta | 1800 | task execution will be cancelled after 30 minutes |