Example LOTUS Job 1: CMIP5 Archive Analysis

This article provides an example of how LOTUS can be used to work with a large data set held in the CEDA Archive. This example shows how to:

  • access data within the CMIP5 archive
  • use LOTUS to process individual files

Scientific description: Calculation of Arctic mean time series

In order to demonstrate one way of using lotus this article uses a simple analysis script to process data from the CMIP5 data set within the CEDA archive.

The python script  arctic_mean.py uses the iris library (part of the SciTools package) for a simple example data processing task. The procedure followed by the script is 

  • load all data from directory specified in first command line argument into a single time series (dealing with inconsistent metadata)
  • extract data for the Arctic region, in this case all latitudes North of 67 N
  • perform an area weighted mean over latitude and longitude dimensions 
  • write data to a netCDF file in the directory specified in the second command line argument, with a file name constructed from the original path.

Listing example files in the CMIP5 archive on JASMIN

The CMIP5 archive contains a vast amount of data that we could look at. Here, we are going to use monthly mean surface air temperature from each of the RCP 8.5 simulations.

The directory structure of the CEDA archive is approximately

/badc/cmip5/data/cmip5/output1/{Institute}/{Model}/{experiment}/{output frequency}/{model component}/{CMIP5 table name}/{realisation}/{version}/{CMOR variable name}/

The command

ls -d /badc/cmip5/data/cmip5/output1/*/*/rcp85/mon/atmos/Amon/*/latest/tas/ > rcp85_tas_directories

creates a list of all the directories in the archive containing monthly mean surface air temperature for the RCP8.5 simulations and puts this in the file rcp85_tas_directories.

Submission of batch jobs

This list contains 98 directories corresponding to the various realizations of the different models with data in the archive. To run the arctic_mean.py script on each of these directories the following commands can be used to set up and submit a set of simple batch jobs to LOTUS writing log information and results to separate directories:

mkdir log data
for d in `cat rcp85_tas_directories`;
do
    bsub -o ./log/%J.out -q short-serial  python2.7 arctic_mean.py ${d} ./data/ ;
done

Each of these batch jobs should be quick as this analysis code is simple. Running commands such as bjobs or bjobs -sum will give you information on how these jobs are progressing.

The output log files will be placed in the log sub-directory when the jobs complete and will contain all the text written to standard output/error along with a description of the command executed and information about the resources required to complete the job. The results of the analysis will end up in the `data` sub-directory if the corresponding job successfully completes. Note that not all of the jobs will be successful; analysis of the log files shows that arctic_mean.py does not cope adequately with the time coordinate information in a quarter of cases.

A simple analysis plot of the results;

Still need help? Contact Us Contact Us