Running multiple commands on LOTUS using the "run-multi" script

This article explains how you can execute multiple commands in parallel on LOTUS using the run-multi script.

Using run-multi to run multiple commands in parallel on LOTUS

A script called run-multi has been provided on LOTUS, which gives you a convenient way to run multiple serial commands simultaneously in a single job, which you submit to the queues in the same manner as you would a parallel job.

Within your job, the maximum number of commands run simultaneously will be equal to the number of processors that you have requested for your job ( -n flag on bsub). Assuming that the total number of commands to be run exceeds this, after the first n have been launched, the run-multi script will wait for one to complete before launching another. Once all commands have been launched, it will wait for these all to finish running, and then the job terminates.

Typically this will be a convenient solution where you have a large number of commands to be run, with possibly unequal run times, as it does not require you to predict how best to split the overall task into a reasonable number of one-processor jobs to submit to the queues.

Here is a usage example, which calculates checksums of a batch of files up to 10 at a time (2 CPU cores/processors per host/node on 5 nodes):

[user@jasmin-sci1 ~]% cat myscript.bsub
#!/bin/bash
#BSUB -q par-multi
#BSUB -J checksumjob
#BSUB -n 10
#BSUB -R "span[ptile=2]"
#BSUB -o %J.o
#BSUB -e %J.e
#BSUB -W 1:00

module add contrib/ceda

run-multi <<EOF
cksum /group_workspaces/jasmin/hiresgw/xjaro/xjaroa.pk19910301_00
cksum /group_workspaces/jasmin/hiresgw/xjaro/xjaroa.pk19910301_01
cksum /group_workspaces/jasmin/hiresgw/xjaro/xjaroa.pk19910301_02
cksum /group_workspaces/jasmin/hiresgw/xjaro/xjaroa.pk19910301_03
cksum /group_workspaces/jasmin/hiresgw/xjaro/xjaroa.pk19910301_04
cksum /group_workspaces/jasmin/hiresgw/xjaro/xjaroa.pk19910301_05
cksum /group_workspaces/jasmin/hiresgw/xjaro/xjaroa.pk19910301_06
cksum /group_workspaces/jasmin/hiresgw/xjaro/xjaroa.pk19910301_07
cksum /group_workspaces/jasmin/hiresgw/xjaro/xjaroa.pk19910301_08
cksum /group_workspaces/jasmin/hiresgw/xjaro/xjaroa.pk19910301_09
cksum /group_workspaces/jasmin/hiresgw/xjaro/xjaroa.pk19910301_10
cksum /group_workspaces/jasmin/hiresgw/xjaro/xjaroa.pk19910301_11
cksum /group_workspaces/jasmin/hiresgw/xjaro/xjaroa.pk19910301_12
cksum /group_workspaces/jasmin/hiresgw/xjaro/xjaroa.pk19910301_13
cksum /group_workspaces/jasmin/hiresgw/xjaro/xjaroa.pk19910301_14
cksum /group_workspaces/jasmin/hiresgw/xjaro/xjaroa.pk19910301_15
cksum /group_workspaces/jasmin/hiresgw/xjaro/xjaroa.pk19910301_16
cksum /group_workspaces/jasmin/hiresgw/xjaro/xjaroa.pk19910301_17
cksum /group_workspaces/jasmin/hiresgw/xjaro/xjaroa.pk19910301_18
cksum /group_workspaces/jasmin/hiresgw/xjaro/xjaroa.pk19910301_19
EOF

This is submitted in the usual way on LOTUS, e.g.:

[user@jasmin-sci1 ~]% bsub < my script.bsub
Job <472943> is submitted to par-multi queue <lotus>.

An explanation of the lines follows:

  • #BSUB -q par-multi - job submitted to the  the parallel queue LOTUS for execution
  • #BSUB -J checksumjob - optional flag with name of job as reported in queue listings
  • #BSUB -n 10 - requests that the job runs on 10 processors. In fact there are 20 commands to run, so 10 will start up immediately, and the remaining 10 will be launched later as others complete.
  • #BSUB -R "span[ptile=2]" - requests that the 2 processors are allocated per node. This setting is potentially important. By default, the queues will pack processors onto nodes, which in the case of LOTUS means 16 processors per node. For CPU-limited jobs this is ideal because it will make most efficient use of the CPU resources on each node. For such jobs, please omit the "#BSUB -R" line and allow the default to be used. However, in this particular example, the limiting factor is likely to be network contention accessing the Panasas storage. By spreading out the jobs more across nodes, greater I/O throughput is achieved. Where commands have large memory footprint (although not in this example), this may be another reason to reduce the number of processors per node requested. Exclusive node usage (#BSUB -x) has not been requested in this example, but obviously the use of many nodes still reduces node availability for other jobs that do require exclusive node use, so please avoid using excessive numbers of nodes.
  • #BSUB -o %J.o - File for standard output, %J expands to job number.
  • #BSUB -e %J.e - File for standard error.
  • #BSUB -W 1:00 - Wall clock time limit (1 hour).
  • module add contrib/ceda - needed in order to find the run-multi executable
  • run-multi <<EOF - The lines of script must be presented to run-multi on standard input. In this example, a "here document" is used, with terminated by the EOF line. However, any other means of providing the script on standard input is equally acceptable, for example run-multi < input_file or using a Unix pipe.
  • cksum ... - Each line should contain a stand-alone command to be run. (Do not try to use shell-script syntax such as loops in the input to run-multi; if necessary, use commands to run a separate script.)

On completion, the standard output file contains the checksums, and the standard error file contains some information from the controlling script. See attachments to this page for examples.

A slight inefficiency will arise during the final stage when the controlling script is waiting for the last remaining commands to complete, as other nodes will be idle during this time. Where the number of commands is large compared to the number of nodes allocated, this will be less of an issue. However, to help you identify any significant inefficiency, the controlling script will report (on standard error) the overall percentage utilisation so that you can tweak the job parameters for future jobs if this is particularly low.

In the special case that a line of input to  run-multi contains (only) the word BARRIER, then this will not be treated as a command but as an instruction to wait for all currently started commands to complete before starting any commands that are provided on subsequent lines. This may be useful if certain commands are dependent on previous ones having completed, but do not use unnecessary barriers as they will reduce the percentage utilisation.

Still need help? Contact Us Contact Us