Data Preservation Policy
Introduction
This policy documents the digital preservation policy of the Centre for Environmental Data Analysis Archive (CEDA Archive). The CEDA Archive is a Natural Environment Research Council (NERC) data centre hosted by the Science and Technology Facilities Council (STFC). The aim is to ensure the longevity of the digital information assets held by the Data Centre in a sustainable way by addressing the factors which risk making them unusable and/or inaccessible.
Preservation Policy
We aim to:
- Maintain the integrity of the data by regularly auditing checksums of data files.
- Ensure all data access is secured for the relevant level of information security.
- Ensure data are accompanied by sufficient documentation to enable their re-use for analytical and research purposes.
- Ensure data are checked and validated according to appropriate data and documentation ingestion procedures.
- Ensure data are catalogued according to appropriate metadata standards.
- Provide suitable storage media for long-term data management, migrating data to new media as needed.
- Ensure we have sufficient rights to preserve data and distribute it under an appropriate licence. However, we don't own the data.
Retention
Retention policy varies for different data sets. Generally:
- Observations, as a unrepeatable measurement of the Earth system, are retained indefinitely.
- Model data, which can have a limited shelf-life and may be withdrawn
- Third party data, where there is other primary archive, may be kept as a rolling archive of the most recent data, or reviewed if usage falls.
- Data that takes significant resource to keep, such as very large datasets, may need special consideration.
- Exceptionally, data may be withdrawn for a number of reasons, see the CEDA withdrawal policy for further details.
External Policy Considerations
As a core activity in the CEDA Archive, preservation does not exist in isolation. It needs to take account of:
- The NERC Data Policy; NERC and STFC Information Security policy; User needs.
- Legislation that applies particularly to data repositories: Data Protection Act 2018 (DPA 2018), General Data Protection Regulation (GDPR), Freedom of Information Act 2000, and Environmental Information Regulations 2004
- Good practice Learnt from other repositories for example, Core Trust Seal certification (presently under submission) and the Open Archival Information System (OAIS) reference model
Preservation Strategy
The preservation strategy of the CEDA Archive aims to maintain a flexible preservation system that can evolve to meet the demands of changing technology and developing user expectations. The CEDA Archive has chosen to implement a preservation strategy based upon open and available file formats. The same ingestion procedure is used for all data resources and no judgement is made on the scholarly value of the datasets once they have been identified as suitable for deposit with the CEDA Archive. All datasets accepted for deposit must be accompanied by supporting documentation of sufficient quality to enable re-use over the long-term. To reduce the risk of obsolescence, files are only accepted in a non-proprietary formats.
Migration to new media is performed as technologies progress, but the data files themselves are unaltered whenever possible. Checksumming of files is performed to verify that nothing has changed.
Online storage for the data centre repository is administered by a dedicated IT infrastructure team. The environmental parameters which control the storage media are tightly controlled to reduce vulnerability. Data are backed up to Tape Libraries continuously. A copy of the archive is kept securely off site and forms a key component of the CEDA Archive's disaster recovery and business continuity procedures, providing for recovery of data and infrastructure under commonly anticipated threats (e.g. technical failure, human error). The system also ensures the safety of the data in the event of a more serious incident if, for example, the buildings housing the data centre and/or major IT infrastructure were to be rendered inoperable.
The preservation of the CEDA Archive's data relies on servers and networks it uses. Currently we use the NERC JASMIN system as our infrastructure. Servers are continually monitored and periodically reviewed to ensure timely upgrades in both hardware and software. JASMIN is run from secure premises with industry standard measures to ensure its physical integrity, such as fire suppression and card entry. Access to JASMIN is secured by public-key ssh and external access is only available via a Secured VPN with multi-factor authentication using digital tokens.
Funding and resource planning
The CEDA Archive is, and always has been, dependant on funding from NERC to carry out its activities. The CEDA Archive is currently funded as part of the NERC Environmental Data Service (NERC EDS), now in it's 2nd phase of 5 years of funding running from April 2023. There is a close cooperation with the other four NERC data centres that, alongside CEDA, are commissioned as part of the NERC EDS: British Oceanographic Data Centre, Environmental Information Data Centre, National Geoscience Data Centre and Polar Data Centre.
Resource management for preservation of digital resources includes:
- technical infrastructure, including equipment purchases, maintenance and upgrades, software/hardware obsolescence monitoring, network connectivity etc.
- financial plan, including strategy and financing the CEDA Archive and commitment to long-term funding
- staffing infrastructure, including recruitment, induction and ongoing staff training
The CEDA Archive makes every effort to remain up-to-date with any relevant technological advances to ensure continued access to its collections. The CEDA Archive also implements a programme of continual improvement in how users interact with the data centre, for example, improved deposit and request functions for users.