Preparing data for archiving

Before data can be deposited into the CEDA archives it will first need to be prepared to ensure that it is suitable for long-term preservation. This will typically involve ensuring that the data are suitably prepared and provided in an archive-suitable file format, preferably one that follows metadata conventions such as the CF-standard. In addition, background information should also be gathered such as instrument logs, calibration information, site location, flight logs etc. as these will provide the necessary background information needed to give context to the data.

The following notes will give some indication of what to do for each of these steps.

When to start considering data archiving.

Ideally the earlier thought is given to how the data will be archived the better as this will ensure that relevant information is captured (e.g. location information, instrument set up, calibration information) that may otherwise be lost. In addition, care early on to prepare the data in the right format can save time later on.

Formatting data for archival.

To ensure long term usability of data in CEDA's archives it is essential that the data are provided in suitable formatted files. CEDA encourages the use of well structured formats such as HDF and netCDF for binary data and NASA Ames or the BADC-CSV format for ASCII encoded data, though there are some other format that we readily accept too. For further details on our standard formats for archiving see our " File Formats Demystified" help page.

If your data are in a format that is widely used within your research community and that format is not listed on the page above, it may still be suitable for archival purposes. Please contact your CEDA liaison officer to discuss this further.

File metadata

An important part of the data preparation for archiving is providing well structures metadata within the file. CEDA encourages the use of international standards such as the Climate and Forecasting Conventions (CF-convention)s and Dublin Core. Most of our standard formats are already set up to follow these standards and depositors are encouraged to follow these. 

Following these standards allows much more powerful use of the data provided - for example, if will allow CEDA to extract information such as temporal and spatial extent and parameter lists directly from the files. These in turn can then aid data discovery and visualisation tools, increasing their accessibility and usability by others. In addition, there are many data analysis tools around to help exploit these data.

Checking your files

There are tools available for some formats to allow people to check their files for compliance to format and/or metadata standards. These include the following:

Format File compliance tool
BADC-CSV BADC-CSV file checker service.
NASA Ames NASA Ames file checker service
NetCDF CEDA netCDF CF-compliance checking service
NCAS-CMS CF-netCDF checking service

