How to allocate job resources

This article explains how to allocate resources for batch computing on LOTUS. It covers:

  • LOTUS queues
  • Job duration
  • Memory requirements  
  • Memory limit 
  • High memory node selection 
  • Exclusive host use 
  • Number of cores
  • Shared scratch space for temporary job files 

It is essential in shared resources environment to closely specify how to run a batch job on LOTUS. Hence,  allocating resources such as the queue, the memory requirement,  the job duration  and the number of cores  is a requirement and this is done by adding specific options to the  job submission command bsub, as detailed below.

LOTUS queues

All jobs wait in queues until they are scheduled and dispatched to hosts. The  short-serial  queue is the default queue  and it should be used for all serial jobs unless there is a memory requirement of over 512 GB per jobs in which case the high-mem queue should be used. An example on how to set a job to a given queue defined by its queue-name is: 

$ bsub -q short-serial < myjob

to view available queues, run the following command:

$ bqueues
test             40  Open:Active       -    -    -    -     0     0     0     0
cpom-comet       35  Open:Active     128    -    -    -  1664  1536   128     0
rsg-general      35  Open:Active     482    -    -    -     6     0     6     0
rsgnrt           35  Open:Active      30    -    -    -    18     0    18     0
copy             30  Open:Active       -    -    -    -     0     0     0     0
sst_cci          30  Closed:Inact     96    -    -    -     0     0     0     0
ingest           30  Open:Active       -    -    -    -     1     0     1     0
short-serial     30  Open:Active    3000 2000    -    - 49163 46166  2997     0
high-mem         30  Open:Active      96   48    -    -     0     0     0     0
par-single       25  Open:Active     512  256    -    -    20     0    20     0
par-multi        20  Open:Active     512  256    -    -   404   320    84     0
long-serial      10  Open:Active     512  256    -    -    31     0    31     0

Queues other than the five  public queues: short-serial, long-serial, par-single, par-multi and high-mem should be ignored as they  implement different job scheduling and control policies. Queues can use all server hosts in the cluster, or a configured subset of the server hosts. 

Note: STATUS is Open and  the queue is Active for a job to be dispatched.

Job duration 

-W 00:30  Sets the runtime limit of your job to a predicted  time in hours and minutes (e.g. 30 mins) - if you do not specify the run time with -W, the default maximum of 1 hour applies

Each queue has a specific maximum allowed job duration see  Table 1. Any jobs exceeding this limit  will be aborted automatically (even if a longer duration is specified)

Specifying memory requirements

Any jobs requiring more then 4GB RAM (which is the memory per core for the lowest-specification host type in LOTUS)  must specify the memory needed with the -R flag:

$ bsub –R “rusage[mem=XXX]”

where XXX is the memory size in MB.

Any jobs using extra memory that have been submitted without this flag may be killed by the service administrators if found to be adversely affecting the performance of other users' jobs.

Memory limit

The memory limit control is enforced on jobs submitted to the serial queues. For jobs with allocated memory requirement greater than 8GB, the memory limit has to be specified otherwise the default memory limit will apply and the job will be terminated if it exceeds 8GB. 

Note in the following:

$ bsub -R "rusage[mem=XXX]" -M YYY

 XXX is in unit of MB  and  YYY is the memory limit in unit of KB. Memory limit is enforced for jobs submitted to serial queues.

In summary:

If... Then...
bsub -R “rusage[mem=XXX]”
bsub -R “rusage[mem=10000]
the default memory limit of 8GB is enforced

this job will be killed when it exceeds 8GB
bsub -R “rusage[mem=XXX]” -M YYY
If YYY < maxlimit = 64000000 KB (64GB)
If YYY > maxlimit = 64000000 KB (64GB)

bsub -R “rusage[mem=15000]” -M 15000000   

 YYY is enforced
maxlimit is enforced

this job will be killed if it exceeds 15GB

Read the   bsub manual page for more information about the -R and -M options including other select key words.

Selecting high-memory hosts

The second phase of LOTUS compute, added in spring/summer 2014, enables high-memory nodes to be selected using the  bsub -R and -Moptions, for example:

$ bsub -R "select[maxmem > 128000]"

This will select from machines with greater than 128000MB physical RAM (units are always in MB) but this doesn't guarantee how much memory is allocated to that job. To target a host with enough free memory, try adding the resource usage:

$ bsub -R "select[maxmem > 128000] rusage[mem=150000]"

If you are concerned about the job using more memory than the allocated memory, you can add a memory limit using  -M If the job's memory usage exceeds the memory limit, LSF will terminate this job:

$ bsub -R "select[maxmem > 128000] rusage[mem=150000]" -M 160000000

12 such high-memory (512GB) hosts are currently available. It is possible to request more than 8 by prior arrangement via the CEDA helpdesk.

Exclusive host use

Adding -x option to the bsub command puts the host running your job into exclusive execution mode and hence avoid sharing with other jobs. This is recommended for very large memory jobs or parallel MPI jobs only.

$ bsub -x < myscript

Spanning multiple hosts for additional memory per process:  

This is  to restrict the number of processes run per host . For example, to run only one process per host use :

$ bsub -R "span[ptile=1]" < myscript

Number of cores

LSF can allocate more than one core to run a job and automatically keeps track of the job status, while a parallel job is running. When submitting a parallel job that requires multiple cores, you can specify the exact number of cores to use.

To submit a parallel job, use -n <number of cores>; and specify the number of cores/processors the job requires. For example:

$ bsub -n 4 myjob

The job "my job" submits as a parallel job. The job is started when four cores are available.

The /work/scratch and /tmp directories

The  /work/scratch directory is a temporary filespace that is shared across the whole LOTUS  cluster, to allow parallel jobs to access the same files over the course of their execution. This directory uses the Panasas high speed parallel file system. Please create a subdirectory :

$ mkdir /work/scratch/newuser

In contrast, the  /tmp directories are all local directories, one per node. These can be used to store small temporary data files for fast access by the local process. Please make sure that your jobs delete any files in /tmp when they complete. Note also that large volumes of data cannot be stored on the local /tmp disk. Please use the /work/scratch directory or group workspaces for large data volumes, but be sure to remove data as soon as possible afterwards. 

Data in these directories is temporary and may be arbitrarily removed at any point once your job has finished running. Do not use them to store important output for any significant length of time. Any important data should be written to a group workspace so that you do not lose it.

Still need help? Contact Us Contact Us