This page is dedicated to solutions to common issues encountered when using the Covalent platform.
If a workflow is hanging, it is likely due to a task that is not completing. To get more information on the status of the workflow, users can check the details of the Covalent server via:
In order to get more informative logs, users can start the Covalent server with
covalent start -d
This may happen for a Local Executor since a worker process in
ProcessPoolExecutor can stall sporadically without failing. In this case, the user needs to restart the server and resubmit the workflow.
A workflow can fail for a number of reasons. The most common reasons are:
- A task is failing to execute due to a runtime error arising from the task definition. In this case, check the logs of the Covalent server to see if there are any errors and the task error cards in the Covalent UI.
- The executor memory and compute resources are insufficient. For memory issues with the Dask executor, the allocated memory per worker needs to increased via (provided the user has enough memory available):
covalent start -n 4 -m "2GB" -d
Check out this discussion for more details.
- Some larger workflows fail for MacOS users due to a too many open files in system error. This can be resolved by increasing the open file limit via the terminal command
ulimit -n 10240and restarting Covalent.
Covalent server not starting
If dispatches are failing due to connection refused errors after running covalent
start it is possible the covalent server was unable to start.
- Ensure that you are able to run the
pythoncommand and that
python --versionis compatible with covalent refer to compatibility section. If your python version is not compatible or if you only have
python3installed it is recommended that a virtual environment is used (several tools can also be leveraged for this: poetry, conda, pyenv, ect.)
Users should ultimately check
covalent logs for more information and submit a new issue on Github discussion with any relevant log information associated with the issue.
Covalent CLI commands throws error/warning
- The covalent config file can periodically get corrupted due to multiple processes attempting to modify the config file simultaneously. This can sometimes be fixed by manually editing the config file. However, if this does not work, the user can delete the config file and restart the Covalent server.
The user also has the option of running
covalent purge which will delete the config file. A new config file will be created when the user runs
covalent start again. This option must be used with caution.
- If DB migration error is thrown, that implies that the database schema is not up to date with the latest version of Covalent. This can be fixed by running covalent db migrate. For more information, check out What To Do When Encountering Database Migration Errors
Getting Result fails for long workflows
- For long-running workflows if a user runs
wait=Trueand observes a
RecursionError: maximum recursion depth exceedederror this means that the result may still be pending or complete but covalent failed during the polling process. Users should still be able to re-run the command to continue waiting for a workflow result.
Executor issues after installation
- Users can get
executor not foundor
Covalent config file missing default valuesfor the executor if Covalent was not restarted after the executor was installed. This can be fixed by restarting the Covalent server.
Lattice not found error when using Self-hosted Covalent
- Errors related to the lattice not being found arise from the user trying to retrieve / access data corresponding to a dispatch id that does not exist in the database. This can happen when the self-hosted and local Covalent servers get mixed up. This can be avoided by explicitly specifying the dispatcher address when dispatching and retrieving results.
In general, users should set the
dispatcher_addr in the
ct.dispatch()/ct.get_result() functions rather than using ct.set_config if they’d like to only temporarily change the dispatcher address.
- The dispatch id is invalid.
Connection timeout error when using Self-hosted Covalent
If a user is getting a connection timeout error while using self-hosted Covalent, it is likely that the local and self-hosted servers are getting mixed up. In this case, the user needs to ensure that the dispatcher address is explicitly set and that the corresponding Covalent server is actually running.