Navigating the CEDA archives
The CEDA archives (including the SPARC data centre) hold over 3Pb worth of data in more than 180 million files. As a result, the archive is highly structured to allow data to be carefully archived, which can make navigation a hurdle for some. The following notes provide some general guidance on the archive structure to aid navigation.
A note on supporting information and data discovery
Though the archive is well structured CEDA have an extensive searchable catalogue service for users wishing to find data across the archive. This catalogue service has been designed to complement the archive content by providing links to supporting documentation and to related data that may not be connected within the archive itself (e.g. by finding all data produced by a particular instrument or facility). To find out more about data discovery see the help page on "Finding Data".
From data centres to datasets
The CEDA archive has a hierarchical structure with a top level splitting to the data centres and the requests area:
/badc
/neodc
/requests
/sparc
Below the BADC, NEODC and SPARC data centre paths the archive is (mostly) structured as follows:
/<data centre>/<data collection>/<metadata|data|etc. splitting>/<M1>/<M2>/<dataset>/<split1>/<split2>/<files>
Where:
- data centre: is the data centre responsible for archiving the data
- data collection: is a directory under which data have been collected together - e.g. all data from a project, instrument, facility - this collection is usually described by a "Dataset Collection" in the CEDA data catalogue
- metadata|data|etc. splitting: below the dataset collection level there is (usually) a splitting found with a number of different directory types are found: data - contains the actual data for the dataset; metadata - contains files that can be used by external services to interrogate archive contents; docs - hold relevant documentation for the dataset collection; images/quick_looks - will have images or plots relevant to the data to aid the user to see what is in the archive ahead of downloading or to support an external service
- <M1>/<M2>/<dataset>: is a splitting down to some point in the archive below which all the data are related by some common theme. The M1/M2/etc splitting will be to aid breaking down the data into more manageable, logical groupings. For example, the M1/M2 splitting might be for an international modelling comparison project first by institute, then by model before arriving at a level that holds all the data for a given experiment. It is this level that is then described by the data catalogue's "Dataset" entries.
- <split1>/<split2>/<files>: below the <dataset> directory there is sometimes a need to further break down the data into manageable sections (we typically aim for less than 1000 files in the lowest level directory to aid archive use). Thus, the <split1>/<split2> directories could be by year/month/day for example.
Data reuse in the archive
Most of the time data are only found in one part of the archive - and indeed data are always only archived at one location! However, sometimes datasets may be of use within other collections too. In these cases users may find that they can navigate to the same data by two or more different roots. This has been done through the use of symbolic linking in the archive.
Access types and limitations
Large parts of the CEDA archives are open access for anyone to make use of the data, while other parts (and to use the FTP service for all data) require users to have a CEDA account first, and yet other parts are further restricted requiring users to apply for access. The access type varies across the archive due to the differing requirements of the data providers in order to respect their intellectual rights to first use of the data or their ownership of the data (e.g. all Met Office data remains the property of the Met Office and thus access is strictly on a licensed basis only).
Whilst navigating the archive users may be asked to login to their account to gain further access (or first register for a CEDA account) and may be informed that their access rights are not sufficient to proceed further down the archive at that point. In such cases the user should find the relevant Dataset entry in the CEDA data catalogue and apply for access to the restricted resource - for further details about this please see our page on data access.
Further details are available in the " Accessing public and restricted datasets" section of the user guide.