Reviewing Data at CEDA: A step by step guide

Overview

1. Access & permissions

2. What do data providers see?

3. Components of a dataset delivery

4. Data centre requirements for deliveries

5. Editing an active delivery

6. Reviewing an active delivery

7. Approving a delivery for ingest

8. Rejecting a delivery

9. After approval, or after rejection...


Introduction

CEDA accepts data relevant to atmospheric and earth observation fields. This includes a wide range of instrumental, satellite, aircraft, observations, analyses and model datasets of interest to the scientific community. These data arrive frequently and in large volumes at CEDA's Arrivals Service (note, you will need to be signed in with the correct admin permissions/status before being able to view this page). Submitted deliveries must currently be manually approved by a reviewer before being ingested to the arrivals area in the data centre. This step-by-step guide will outline the process to review a dataset delivery. 

NERC data centres linked with CEDA Arrivals 

are as follows:

CEDA staff, as well as staff from the above data centres, may act as reviewers for dataset deliveries. The arrivals page for a particular data centre may be accessed by the following URL: https://arrivals.ceda.ac.uk/ <datacentre>/intro/. With BODC, for example: https://arrivals.ceda.ac.uk/bodc/intro/.

Step-by-step guide

  1. Access & permissions:If you have admin permissions/privileges/status on the CEDA Archive, you will see the "Reviewer" Label on this bar at the top of the screen. The two links next to this are 'Deliveries' (a), which brings you to the list of dataset deliveries in all stages, and 'Admin' (b), which brings you to the site administration actions. The number next to 'Deliveries' is the number of deliveries which currently require action. 

    a) b)
  2. What do data providers see?

    The arrivals portal is structured like so: When data providers make a dataset delivery, the arrivals interface takes them through the following actions/workflow stages/pages:

    a) Deliveries portal - 
    The data provider is prompted to ensure their dataset conforms to the requirements for archival at CEDA, and agrees to the deposit agreement. They then create a new delivery ('+ New delivery') or choose an existing delivery name associated with their user account (e.g. when uploading further data for an ongoing project).


    b) Upload data - 
    Here the user chooses the files to attach to the delivery. There are several tools available to them at this stage: unzip zipped files, fix bad names (this removes unusual characters and spacing), remove empty directories, remove zero length files and remove links from filenames. They are also given the option to upload data via FTP or RSYNC .


    c) Review dataset - 
    d) Submit delivery - 
    e) Finish - 

  3. Components of a dataset delivery: ** Screenshot the delivery summary page and circle/annotate the key components as below.
    • Owner: CEDA account username
    • Created: datetime format = D MON YYYY, H:MM a.m./p.m.
    • Data centre: Will be one from the list in the introduction.
    • Total size of dataset: Expressed in GB.
    • Total number of files: ...
    • File formats: Common file formats for deliveries are: netCDF (.nc), BADC-CSV (.csv), NASA Ames (.na), HDF (.hdf), tar and compressed file types.
  4. Data centre requirements for deliveries:

    Reviewers should check that the following requirements (in the deliveries portal (a) section of the arrivals uploader) are fulfilled:

    • In the archive remit - e.g. Atmospheric or Earth Observation science for the CEDA Archive.
    • Available on an open licence.
    • Comprised of a set of files that are laid out in a understandable directory structure.
    • A single one-off delivery with no need update.
    • In a recommended format, like NetCDF, with appropriate metadata conventions, like CF.
    • Not too voluminous - needs prior agreement with CEDA if uploading more than 5TB.
    • Accompanied by a full description of the dataset so that we can describe it in our catalogue.

    Reviewers should check that the catalogue link, given by the data provider when making the delivery, is active and the record is suitably formatted. This link does not have to be a CEDA catalogue record, but should allow metadata to be harvested for such a record. If there is not a catalogue link, the reviewer should actively check for a metadata.yaml file submission. The contents of this YAML file will allow the reviewer to create the CEDA catalogue record manually. YAML files might not link the records to its final archive path correctly, this should therefore be checked and corrected if necessary.

  5. Editing an active delivery: 

    The 'Edit' page allows the reviewer to change elements of the delivery: the name which will be given to the directory in the archive, the data centre associated with the delivery (and therefore which reviewers are responsible for approving it), and catalogue link to a draft record. 

  6. Reviewing an active delivery:

    This is identical to the "Upload data" page (b) of the data provider workflow when delivering data to arrivals. Reviewers are able to apply tools to the datasets in the same way with the blue buttons, i.e. unzip all, remove empty directories, fix bad names, remove zero length files and remove links. Any modifications which are made by the reviewer in this stage should be clearly communicated to the data provider. 

    Reviewers are reminded that each filename needs to conform to the CEDA file-naming convention (more information here). Incorrect separators in names (i.e. dashes '-' instead of underscores '_') should be handled by the "fix bad names" tool. Each change can be saved by clicking the edit icon next to the delete/remove button. 
  7. Approving a delivery for ingest:

    If the requirements of the dataset delivery have been met, all components properly checked and the reviewer is satisfied to accept the delivery, the green 'Approve' button in the grey 'Reviewer' tab should be selected. There are several options for ingest stream according to the processing which needs to be done to the dataset.

    • Email DC - manual ingestion - sends message to CEDA helpdesk (data.management@ceda.ac.uk), delivery name may need to be changed to a more suitable descriptor for archive.
    • CEDA Standard - automatic ingestion - triggers ingest to archive. A job is attempted to archive dataset. The delivery name will become the corresponding repository name in the archive. The end-point in the archive will be assigned the current year (note: when it was ingested, not when it was deposited). 
    • NGDC catalogue grant - an NGDC ingest stream that takes the NERC grant number from the BGS catalogue page, provided a link and correct formatting. A metadata.yaml file should not be included with this delivery, as metadata is harvested from the existing BGS catalogue record for the dataset.

      Note: If the data centre associated with the dataset delivery is not CEDA (viewable in the delivery Summary page), only the first option in the following dropdown list ("Email DC") will be visible. This means the delivery will need to be manually reviewed and ingested by datacentre staff.
  8. Rejecting a delivery:

    If the reviewer is not satisfied with the state of the dataset/that it meets the criteria/requirements for approval, the dataset can be rejected. In this case, it is particularly important to provide feedback to the data provider in order to help resolve issues as efficiently as possible. This can be done in the message interface shown below.

    If there are minor changes, the reviewer may advise the data provider to make these offline, upload the new files and resubmit the existing dataset delivery. 
    If there are major changes, the reviewer should explain why the dataset does not meet the required standard for the CEDA archive and recommend a course of action.
  9. After approval, or after rejection...

    If approved:

    Once the dataset is ingested to the archive, an email is sent to the data submission email for the associated datacentre - this includes the location (end-point) of the delivery in the archive.

    CEDA: data.management@ceda.ac.uk
    PDC:  polardatacentre@bas.ac.uk
    BODC:  data.management@ceda.ac.uk
    EIDC:  info@eidc.ac.uk
    NGDC: ngdc@bgs.ac.uk
    The data provider is sent an email that their delivery has been approved. At the moment this does not include more identifying information about the delivery other than its name (i.e. no catalogue link, creation date, or other delivery components from the summary page). 

    If rejected:

    The data provider will receive an email from the datacentre notifying them that their delivery has been rejected (this will be sent to the email address registered with the user who submitted the delivery). The datacentre does not receive an email, but the delivery will have a 'rejected' tag on the arrivals deliveries inbox.

    The reviewing process in this guide should be repeated if the delivery is resubmitted for approval.

    After the dataset is in the CEDA archive, the reviewer needs to conduct a final check on the catalogue record for content and completeness, and submit the catalogue record for MOLES review & publishing using this form.

Additional documentation/help pages


If at any point in this guide you are unsure of the proper process, please contact the CEDA data management helpdesk at data.management@ceda.ac.uk - this deals with communicating with projects and data arriving via arrivals and they will be able to help you with your issue.

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.

Still need help? Contact Us Contact Us