Directory Structure Conventions
Introduction
To ensure that MOLES Metadata entries link logically through to the endpoints within the archive it is important that the archive structure used for any dataset follows a recognised structure model. The CEDA Archive Structure Model has been developed from examining a number of typical use cases from within the archive and then seeing how these map over to the MOLES 2/MOLES 3 concepts. Commonalities between the existing "datasets" within the CEDA archive permit an Archive Structure Model to be formulated and this presents the opportunity to align the MOLES archive with MOLES 3 concepts and other archive structure paradigms, e.g. CMIP5 archive structure.
The attached file is a presentation is a summary of the conversations that took place during the development of the CEDA Archive Structure Model. For further clarification discuss this with Sam Pepler, Graham Parton and Spiros Ventouras.
NOTE: 2013-04-23: how to deal with super-collections of data, such as ESA CCI products, has yet to be fully formulated
Basic Structure
The archive structure basically breaks down as follows:
data centre/data entity path/{data}/Reference Path/File Management Path
or more explicitly:
/<data centre>/<primary Data Entity>/<metadata|data>/<M1>/<M2>/<deployment>/<split1>/<split2>/<files>
NB all parts after <metadata|data> level listed above apply only to <data> path.
Where :
<data centre> will be the data centre that the data are part of.. e.g. neodc/badc/cmip5.
<primary data entity> is the principal MOLES data entity under which the data are associated. NB this may not be the most logical place at the end of a project, but should be established as best as possible before data arrive based on available knowledge
<metadata|data> is used at provide a top level split between the actual data and other types of files, e.g. CMSL metadata files
<Reference Path> is a metadata hierarchical splitting that finishes up at a referenced object - the deployment . See below for further information on how this links to MOLES 3 concepts.
<File Management Path> is a locally useful way of splitting down files within a given deployment , e.g. by YYYY/MM/DD. The depth of splitting should be to have a balanced of enough files to be sensible, but not excessive
Links with MOLES entries
Within the MOLES catalogue there are now two areas where links to the archive can be made. These are from the Data Entity and, to provide the opportunity to prepare for MOLES 3 concepts being implemented, from the the Deployments.
Data Entities should have the URLs pointing down to the /<data centre>/<data entity>/ part of the archive, while Deployments should follow the relevant <Reference Path> to the relevant <deployment> reference.
Examples
The following illustrate how this model is used in practice:
/badc/faam/data/YYYY/<flight-number>/<products type>/files e.g. /badc/faam/data/2009/b462-jun-24/core_processed/<files|sub-directories> /badc/appraise/data/<project>/ e.g. /badc/appraise/data/adient/bae-146/b462-2009-jun-24 -> /badc/faam/data/2009/b462-jun-24 /badc/fgam/data/instrument/data/YYYY/MM/files e.g. /badc/fgam/data/man-radar-1290Mhz/data/2002/10/man-radar-1290Mhz_20021017_low.nc /badc/ecmwf-era-interim/data/<grid types>/<analysis|forecast><level type>/YYYY/MM/DD/files e.g. /badc/ecmwf-era-interim/data/gg/as/2009/09/26/ ggas200909261200.nc