Ingest script writing conventions

The working versions of ingest scripts are generally in the badc home area under software/datasets.

The scripts should be in a git repository on gitlab (https://breezy.badc.rl.ac.uk/) to record changes and act as a preserved copy of the scripts. Ingest scripts could contain passwords for external services so the internal gitlab repository is most suitable.

The scripts used for ingest are generally small scale and hard to separate from configuration used in the deployment. Configuration and scripts are changed very frequently and so should be kept in the same package.

Creating a ingest script package

Use the gitlab web interface to make a package.

git clone package into the software/datasets dir

Updating scripts and configuration

Change files and commit to git

git status

git add

git commit -m 'a commit message'

Push the commits to gitlab

git push

Scheduling and running

If there is a chance that the script will be run again, either manually or as a scheduled job, then it should be added to ingest_control (https://ceda-internal.helpscoutdocs.com/article/4272-ingest-control). Adding script to ingest control allows other users to control the dataset workflow if the original creator of the script is not there, and also documents the setup and environment for the script.