Using the command line for CREPP
The CREPP CEDA Receive to publish pipeline was developed by Alan Iwi and Ag Stephens to ingest CMIP6 data and then subsequently publish this to ESGF. Data for ingest will be from either the Met Office MASS archive, already on JASMIN, or replicated international data.
Managing CREPP transfers
CREPP dataset status' can be seen through the CREPP application web interface at ppln.ceda.ac.uk most commonly using the dataset view.
Logs are on the two machines used by CREPP some logs are only available on appropriate machine ( confirm with Alan?)
If a dataset is found to be in error if you have admin rights the dataset can be retried using the web interface, however this is only feasible for a very small number of datasets.
For bulk dataset retries use the dataset_actions.sh script located in the scripts dir, note this script only resets flags and so it is irrelevant which server this is initiated from:
On this can be done from the scripts/ directory using
usage: dataset_actions.py [-h] (-R | -S | -P <priority> | --clear-priority | -p | -r | -u | -q | -W | --republish) [-v] config [datasets [datasets ...]] positional arguments: config CREPP configuration name (e.g. 'esgf-prod') datasets dataset ID(s); if none, they are read from stdin optional arguments: -h, --help show this help message and exit -R, --retry retry failed step -S, --skip skip failed step -P <priority>, --set-priority <priority> set dataset priority --clear-priority clear dataset priority -p, --pause pause dataset -r, --release release dataset pause -u, --update update processing status and last event -q, --query query status -W, --withdraw withdraw dataset --republish republish withdrawn dataset -v, --verbose more verbose output
Removing from the archive - (esgdrs errors)
If a dataset needs removing from the archive to clear an esgdrs.out error use or other reason then on:
ingest3 ~/software/crepp/crepp/scripts/remove-from-archive.sh < <dataset_list>
then retry the ingest step.