Using the command line for CREPP

The CREPP CEDA Receive to publish pipeline was developed by Alan Iwi and Ag Stephens to ingest CMIP6 data and then subsequently publish this to ESGF. Data for ingest will be from either the Met Office MASS archive, already on JASMIN, or replicated international data.

Managing CREPP transfers

CREPP dataset status' can be seen through the CREPP application web interface at ppln.ceda.ac.uk most commonly using the dataset view.

Debugging

Logs are on the two machines used by CREPP some logs are only available on appropriate machine ( confirm with Alan?)

ingest3
~/software/crepp/crepp/logs

esgf-pub 
/usr/local/crepp/logs

If a dataset is found to be in error if you have admin rights the dataset can be retried using the web interface, however this is only feasible for a very small number of datasets.

For bulk dataset retries use the dataset_actions.sh script located in the scripts dir, note this script only resets flags and so it is irrelevant which server this is initiated from:

On this can be done from the scripts/ directory using

dataset_actions.sh -h

usage: dataset_actions.py [-h]
                          (-R | -S | -P <priority> | --clear-priority | -p | -r | -u | -q | -W | --republish)
                          [-v]
                          config [datasets [datasets ...]]
positional arguments:
  config                CREPP configuration name (e.g. 'esgf-prod')
  datasets              dataset ID(s); if none, they are read from stdin

optional arguments:
  -h, --help            show this help message and exit
  -R, --retry           retry failed step
  -S, --skip            skip failed step
  -P <priority>, --set-priority <priority>
                        set dataset priority
  --clear-priority      clear dataset priority
  -p, --pause           pause dataset
  -r, --release         release dataset pause
  -u, --update          update processing status and last event
  -q, --query           query status
  -W, --withdraw        withdraw dataset
  --republish           republish withdrawn dataset
  -v, --verbose         more verbose output

Removing from the archive - (esgdrs errors)

If a dataset needs removing from the archive to clear an esgdrs.out error use or other reason then on:

ingest3
~/software/crepp/crepp/scripts/remove-from-archive.sh < <dataset_list>

then retry the ingest step.