Skip to main content

Google Cloud Platform

Covalent Google Batch executor is an interface between Covalent and Google Cloud Platform’s Batch compute service. This executor allows execution of Covalent tasks on the Google Batch compute service.

This batch executor is well suited for tasks with high compute/memory requirements. The compute resources required can be very easily configured/specified in the executor’s configuration. Google Batch scales really well thus allowing users to queue and execute multiple tasks concurrently on their resources efficiently. Google’s Batch job scheduler manages the complexity of allocating the resources needed by the task and de-allocating them once the job has finished.

1. Installation

To use this plugin with Covalent, simply install it using pip:

pip install covalent-gcpbatch-plugin

The required cloud resources can be created with:

covalent deploy up gcpbatch

See Automated Cloud Compute Deployment for more information.

2. Usage Example

Here we present an example on how a user are use the GCP Batch executor plugin in their Covalent workflows. In this example we train a simple SVM (support vector machine) model using the Google Batch executor. This executor is quite minimal in terms of the required cloud resources that need to be provisioned prior to first use. The Google Batch executor needs the following cloud resources pre-configured:

  • A Google storage bucket

  • Cloud artifact registry for Docker images

  • A service account with the following permissions

    • roles/batch.agentReporter
    • roles/batch.agentReporter
    • roles/logging.viewer
    • roles/artifactregistry.reader
    • roles/storage.objectCreator
    • roles/storage.objectViewer

Note

Details about Google services accounts and how to use them properly can be found here.

from numpy.random import permutation
from sklearn import svm, datasets
import covalent as ct

deps_pip = ct.DepsPip(
packages=["numpy==1.23.2", "scikit-learn==1.1.2"]
)

executor = ct.executor.GCPBatchExecutor(
bucket_name = "my-gcp-bucket",
region='us-east1',
project_id = "my-gcp-project-id",
container_image_uri = "my-executor-container-image-uri",
service_account_email = "my-service-account-email",
vcpus = 2, # Number of vCPUs to allocate
memory = 512, # Memory in MB to allocate
time_limit = 300, # Time limit of job in seconds
poll_freq = 3 # Number of seconds to pause before polling for the job's status
)

# Use executor plugin to train our SVM model.
@ct.electron(
executor=executor,
deps_pip=deps_pip
)
def train_svm(data, C, gamma):
X, y = data
clf = svm.SVC(C=C, gamma=gamma)
clf.fit(X[90:], y[90:])
return clf

@ct.electron
def load_data():
iris = datasets.load_iris()
perm = permutation(iris.target.size)
iris.data = iris.data[perm]
iris.target = iris.target[perm]
return iris.data, iris.target

@ct.electron
def score_svm(data, clf):
X_test, y_test = data
return clf.score(
X_test[:90],
y_test[:90]
)

@ct.lattice
def run_experiment(C=1.0, gamma=0.7):
data = load_data()
clf = train_svm(
data=data,
C=C,
gamma=gamma
)
score = score_svm(
data=data,
clf=clf
)
return score

# Dispatch the workflow
dispatch_id = ct.dispatch(run_experiment)(
C=1.0,
gamma=0.7
)

# Wait for our result and get result value
result = ct.get_result(dispatch_id=dispatch_id, wait=True).result

print(result)

During the execution of the workflow the user can navigate to the web-based browser UI to see the status of the computations.

3. Overview of Configuration

Config KeyIs RequiredDefaultDescription
project_idYesNoneGoogle cloud project ID
regionNous-east1Google cloud region to use to for submitting batch jobs
bucket_nameYesNoneName of the Google storage bucket to use for storing temporary objects
container_image_uriYesNoneGCP Batch executor base docker image uri
service_account_emailYesNoneGoogle service account email address that is to be used by the batch job when interacting with the resources
vcpusNo2Number of vCPUs needed for the task.
memoryNo256Memory (in MB) needed by the task.
retriesNo3Number of times a job is retried if it fails.
time_limitNo300Time limit (in seconds) after which jobs are killed.
poll_freqNo5Frequency (in seconds) with which to poll a submitted task.
cache_dirNo/tmp/covalentCache directory used by this executor for temporary files.

This plugin can be configured in one of two ways:

  1. Configuration options can be passed in as constructor keys to the executor class ct.executor.GCPBatchExecutor

  2. By modifying the covalent configuration file under the section [executors.gcpbatch]

[executors.gcpbatch]
bucket_name = <my-gcp-bucket-name>
project_id = <my-gcp-project-id>
container_image_uri = <my-base-executor-image-uri>
service_account_email = <my-service-account-email>
region = <google region for batch>
vcpus = 2 # number of vcpus needed by the job
memory = 256 # memory in MB required by the job
retries = 3 # number of times to retry the job if it fails
time_limit = 300 # time limit in seconds after which the job is to be considered failed
poll_freq = 3 # Frequency in seconds with which to poll the job for the result
cache_dir = "/tmp" # Path on file system to store temporary objects

4. Required Cloud Resources

Certain GCP must exist to run workflows with the GCP Batch executor plugin. This plugin supports automated resource creation with:

covalent deploy up gcpbatch

See Automated Cloud Compute Deployment for more information.

  • Google storage bucket

    • The executor uses a storage bucket to store/cache exception/result objects that get generated during execution.
  • Google Docker artifact registry

    • The executor submits a container job whose image is pulled from the provided container_image_uri argument of the executor.
  • Service account

    • Keeping good security practices in mind, the jobs are executed using a service account that only has the necessary permissions attached to it that are required for the job to finish.