Levels of Data Curation
The CEDA Archive seeks to offer long-term archiving for data, but the level of curation needed will vary from dataset to dataset based on how likely onward re-use is expected to be. Generally, data with a higher level of preparation will be more widely accessible to a wider pool of future users, but CEDA also recognises there are resource limitations limiting this.
The table below shows three levels of curation offered by the CEDA Archive, what level of re-use that can be expected for each type, and the level of data standard and conventions the data will need to meet.
|
Reference |
Structured |
Interoperable |
Suitable for | Complete datasets |
Key or ongoing datasets |
Core community datasets |
Anticipated data re-use level | Low |
medium-high | high |
Discoverable in CEDA data catalogue, Google Scholar, NERC Data Catalogue, Data.gov.uk etc. | ✓ | ✓ | ✓ |
DOI-able dataset (citable in papers) | ✓ | ✓ | ✓ |
Web, FTP download | ✓ | ✓ | ✓ |
Direct JASMIN access | if permitted | if permitted |
if permitted |
Community wide/archive quality format (e.g. netCDF) | encouraged | ✓ | ✓ |
File metadata follows conventions (e.g. CF) | encouraged |
✓ | ✓ |
Extra data tools (e.g. subsetting) | ✓ |
Reference
These are data that are discoverable and downloadable, but it is left to the user to work out some of the usability issues. CEDA will make a catalogue entry and add the data files to the archive. This is a suitable solution if there is not likely to be mass interest in the data and their principal objective is to provide evidence to support a publication. Data in this category should be small volume (< 1TB).
Structured
Minimum qualification: evidence of use of similar datasets by CEDA core communities.
Interoperable
In addition to being Referenced and Structured, these data are connected to specific community tools or systems that enable better discovery or processing. For example, climate model data in ESGF, MIDAS land surface station data in the CEDA WPS or aircraft data in the Flight Finder tool.
Minimum qualification: evidence of use of similar datasets by CEDA core communities and community tool specifications. Some evidence that the data will fit the tools.