LOTUS queues

This article introduces the new LSF queues on LOTUS.  It covers:

  • Queue name
  • Queue details
  • Queue priority
  • How to use serial queues
  • How to use parallel queues

Queue name 

LOTUS has five public queues:

  • short-serial
  • long-serial
  • par-single
  • par-multi
  • high-mem

Each queue has an attribute of run length limits (e.g. short, long) and resources. A full breakdown of each queue and its associated resource is shown below in Table 1.

Queue details

Queues represent a set of pending jobs, lined up in a defined order and waiting for their opportunity to use resources. Jobs must be submitted to a queue using the    bsub -q <queue_name> command where <queue_name> is the name of the queue (Table 1. column 1)

Table 1 summarises important specifications for each queue such as run time limits and number of CPU core limits. If the queue is not selected, LSF will schedule the job to the short-serial queue by default.  

Table 1. LOTUS queues and their specifications

Queue name 
Max run time Default run time Max cores per job Max cores per user Priority
short-serial  24hrs 1hr 2000 30
par-single 48hrs 1hr 16 256 25
par-multi 48hrs 1hr 256 256 20
long-serial 168hrs 1hr 1 256 10
high-mem 48hrs 1hr 1 48 30

Note 1: Resources that the job requests must be within the resource allocation limits of the selected queue. 

Note 2: Any jobs requiring more than 4GB RAM (which is the memory per core for the lowest-specification host type in LOTUS)  must specify the memory needed with the -R flag.  Note to  estimate and allocate resources for jobs.

Note 3: The default value for the -W (predicted wall time) is 1 hour for the five LSF queues. If you do not specify this option and/or your job exceeds the maximum run time limit then it will be terminated by the LSF scheduler. 

Queue priority

Queue priority defines the order in which queues are searched to determine which job will be processed. Queues are assigned a priority by the LSF administrator, where a higher number has a higher priority. Queues are serviced by LSF in order of priority from the highest to the lowest. 

Each of the queues listed above have been given a priority (Table 1. column 6) to ensure fair share scheduling. The shorter run time queues have a higher priority than the longer run time queues to ensure shorter jobs get completed quicker. For example, if a job is pending in the short-serial queue and likewise for the long-serial queue, the job in the short-serial queue will be scheduled to run first. So before submitting jobs to a queue, ensure the most appropriate queue is selected to prevent inefficient scheduling.

If multiple queues have the same priority, LSF schedules all the jobs from these queues in first-come, first-served order.

Serial queues

Serial and array jobs with a single CPU core should be submitted to one of the following serial queues depending on the job duration and the memory requirement. The default queue is   short-serial  

short-serial

Serial or array jobs with a single CPU core and run time less than 24 hrs should be submitted to the  short-serial  queue. This queue has the highest priority of 30. The maximum number of jobs running per user is 2000 from the  short-serial  queue and as long as job's resources are within the resource limit of this queue. An example is shown below:

$ bsub -q short-serial -W 00:05 -o %J.out -e %J.err /bin/hostname
Job <2170892> is submitted to queue <short-serial>.

$ bjobs
JOBID     USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
2170892   fchami  RUN   short-seri jasmin-sci1 host171.jc. */hostname  Oct 12 18:55

Note that to display job information without truncating fields, use the wide format option for the command bjobs -w.

long-serial

Serial or array jobs with a single CPU core and run time greater than  24 hrs and less than 168 hrs (7 days) should be submitted to the long-serial . This queue has the lowest priority of 10 and hence jobs might take longer to be scheduled to run relatively to other jobs in higher priority queues.

$ bsub -q long-serial  -o %J.out -e %J.err /bin/hostname
Job <2171658> is submitted to queue <long-serial>.

$ bjobs
JOBID     USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
2171658   fchami  RUN   long-seria jasmin-sci1 host073.jc. */hostname Oct 12 19:06

high-mem 

Serial or array jobs with a single CPU core and high memory requirement (> 64 GB) should be submitted to the  high-mem queue and the job should not exceed the maximum run time limit of 48hrs. This queue is not configured to accept  exclusive jobs.

$ bsub -q high-mem -o %J.out -e %J.err /bin/hostname
Job <3531310> is submitted to queue <high-mem>.

$ bjobs
JOBID     USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
3387435   fchami  PEND  high-mem   jasmin-sci1 host291.jc  */hostname Oct 19 16:38

Parallel queues

Jobs requiring more than one CPU core should be submitted to one of the following parallel queues depending on the type of parallelism such as shared memory or distributed memory jobs.

par-single

Shared memory multi-threaded jobs with maximum 16 threads should be submitted to the par-single queue. Each thread should be allocated one CPU core. Oversubscribing number of threads to the CPU cores will cause the job to run very slow. The number of CPU cores should be specified via LSF submission command flag bsub -n <number of CPU cores> or by adding the LSF directives #BSUB -n <number of CPU cores> to the job script file. An example is shown below:

$ bsub -q par-single  -n 10 <  singlenode.bsub
Job <2338714> is submitted to queue <par-single>.

$ bjobs
JOBID     USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
2338714   fchami  RUN   par-single jasmin-sci1 10*host290. singlenode Oct 13 10:15

Note : Jobs submitted with a number of CPU cores greater than 16  will be terminated (killed) by LSF scheduler with the following statement in the job output file:

TERM_PROCESSLIMIT: job killed after reaching LSF process limit.
Exited with exit code 254.

par-multi

Distributed memory jobs with inter-node communication using the MPI library should be submitted to the par-multi queue. A single MPI process (rank) should be allocated a single CPU core. The number of CPU cores should be specified via LSF submission command flag bsub -n <number of CPU cores> or by adding the LSF directives #BSUB -n <number of CPU cores> to the job script file. An example is shown below:

$ bsub -x -q par-multi -n 24 <  multinodes.bsub
Job <2338707> is submitted to queue <par-multi>.

$ bjobs 
JOBID     USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
2338707   fchami  RUN   par-multi  jasmin-sci1 16*host285.j multinodes Oct 13 10:06
                                               8*host282.jc.rl.ac.uk

Note 1: The number of CPU cores gets passed from LSF submission flag -n. Do not add the -np flag to lotus.mpirun command.

Note 2: Adding -x option to the bsub command puts the host running your job into exclusive execution mode and hence avoid sharing with other jobs. This is recommended for very large memory jobs or parallel MPI jobs only.

Note 3: LSF will terminate a job that requires a number of  CPU cores greater than the limit of 256. 

Reservation of resources

It is possible to make resources available for certain use cases by setting a reservation code for a given resources of compute time, number of CPUs and memory. Please contact  CEDA support to enquire on the criteria to request a reservation of resources.

Still need help? Contact Us Contact Us