Archiving Overview
Data management takes place at all points in the data lifecycle. Some of these points exist upstream of the data centre and may not lead to data being archived for long-term preservation. However, the archive may provide guidance, support and services at some or all points as required.
Various items in the CEDA Archive Operational Manual cover work undertaken by CEDA Archive staff and exist to document workflows, policies and services as needed to aid data management.
Data Lifecycle - CEDA Archive roles and links to sections
With regards to the data that are handled by the CEDA Archive the following points of the data lifecycle are important. Links to relevant documentation each point are indicated as needed. Note, there may be ad-hoc engagements with specific projects/facilities with regards to data management that fall out of the scope of these operational manual pages.
Initial scoping 1- Grant awards and Outline Data Management Plans
For NERC grant awards: As part of the submission process for a NERC award applicants are required to submit an Outline Data Management Plan. Requirements and guidance on preparing ODMPs are available in the award system.
- Details on ODMPs etc for applicants: https://nerc.ukri.org/research/sites/environmental-data-service-eds/dmp/
- Enquiries regarding additional support (e.g. additional services such as database development or a specialist in project data management), and costing to be included in the grant proposal may be received by the helpdesk. These should be passed to the Archive Manager
- See DataMad documentation for related details.
For Non-NERC awards: There isn't an official role in the grant application process as for NERC awards, but enquiries on costs etc. may be made via the helpdesk or other routes. These should be passed to the Archive Manager in the first instance to determine if the request is within scope (see Acquisitions and Retention policy) of the data centre and if so associated costings and subsequent actions.
Initial scoping 2 - Data Management Plans
For successful NERC applications or agreed work for non-NERC grant awards a second scoping phase will then take place to establish the required Data Management Plan. See DMP op man pages. The NERC data value checklist may be useful to aid discussions to determine which data should be archived. Additionally, for model/computational output see the selection guidance notes where data volumes may be problematic.
Continuing communications
Once the DMP has been agreed communications with the data providers should continue as required to ensure that timely support is provided to aid their data management, preparation for archival and delivery to the archive. Additionally, supporting documentation should be captured. See JIRA guidance and data management helpdesk pages for details on stages of comms and how to use the system
Capturing Supporting Docs
Data stored for long-term and re-use require supporting information to aid their interpretation. Besides catalogue information additional supporting material may need to be captured. Such details are best captured during data production if possible. However, not all supporting material is the same in terms of the types of archiving needed. See the ' Where to put data docs' guidance for a workflow to establish where to put supporting materials.
Ingest
Guidance for users is available under the Archiving data with CEDA help docs site.
For the CEDA side see the General Ingest Process documentation. If there are specific detailed/ongoing processes that require additional documentation please create an entry under the Specific Ingest Processes category using the template provided.
If a new format is required then undertake a format review.
Tools to use for ingestion, checking etc. are listed under the tools section.
Cataloging and indexing
Archiving data also requires the content to be catalogued to aid discovery and indexing to enable their use/access through archive tools. See CEDA data catalogue (MOLES) docs for details on preparing content for the data catalogue. Indexing of content automatically happens through the File Based Index system, triggered via ingest and associated functions that utilise the Deposit Client. See 'Filling in missing datasets for data.ceda` documentation items require re-indexing (e.g. to pick up parameters). Where metadata scraping is not possible from the FBI (e.g. incomplete metadata, missing content, removed content, external content, offline content) enter details into the CEDA Manual Metadata Store (CMMS).
Archive management
Storage media varies in the CEDA archive and some data are stored in the Near-Line Archive. See here for details on using the NLA for storage. Move to tape. User help guide for using the NLA is available here
End of data lifecycle
Data may be deaquisitioned according to the CEDA Archive withdrawal policy In such cases it is essential follow the 'Remove data procedure' both for simple removal of empty directories (ie. data failed to arrive) and actual removal of content. This ensures that a full audit trail is maintained related to data management activities.