Migrating Nextflow pipelines from UGE to SLURM
Our IT decided to switch from Univa Grid Engine to SLURM workload manager, so I had to adapt my pipelines to the new system.
Luckily my pipelines are written in Nextflow, an open-source workflow system developed at the Centre For Genomic Regulation, that supports natively different workload managers.
In brief, I had to add to my nextflow.config file a new profile with the information for submitting jobs to the new SLURM workload.
In the the slurm.config file we need to indicate the executor, and the name of the queues as we were doing for UGE / SGE.
In some settings we don’t have different queue names for different resources needed; we may find the distinction made by using different quality of service (QoS). Nextflow does not support QoS natively but we can take advantage of the clusterOptions directive as follows:
With this structure, we can switch the queue depending on the time or other variables.
Our IT decided also to have a dedicated queue for submitting Nextflow jobs to avoid overloading the access nodes. In brief, we can submit the main Nextflow jobs to these nodes that can in turn submit jobs to the clusters. The main limitation of this process is related to the killing of the main job. When the pipeline exits, there is no time left for letting Nextflow to kill all the jobs in the cluster, so we need to specify a trap function to catch the SIGTERM signal when the pipeline exits to propagate this signal to the main Nextflow process. In this way, Nextflow will have time to terminate all the child processes before exiting. See the following example: