Selection and retention policy for data
Why selection policy?
Why manage data at all? Data Managment is an active, difficult, long-term and expensive process, it surely needs some convinving reasons to engage with such a massive undertakeing. If the answer to one or more of the following questions is yes, then data are candidates for professional data management.
- Primary use : A project has produces and uses data to do its business. Some basic data managment needs to be done by the project itself to make sure its data are looked after within the project. The life time of this is the life of the project. Because the need to share is only within the project there is drive to more universal standards.
- Use facilitation : Is the management of the data by a project team likely to be too onerous for them or result in duplication of effort with other NERC funded activities? If so keep the data for a year after the project ends.
- Community Re-Use : Are there — or are there likely to be in the future — users from the subject community in which the data originated, who might use the data without having one of the original team involved as co-investigators (or authors)? If so keep the data for 5 years beyond the end of the project.
- General Re-Use : Is there — or is there likely to be in the future — a community of potential users who might use the data without having one of the original team involved ? If so then keep the data indefinitly.
- Historical Reference : Does the data have some historical importance. Some data may become landmarks, in some way, along the route of scientific knowledge. If so then keep the data indefinitely.
- Legal Reference : Is there a legal reason to keep the data? Data may have been quoted to make a statement that might be challenged or there may be specific legislation that demands certain retention periods. If so the data should be kept for the time specified in the legislation.
- Academic Reference : Is the data likely to be refereed to in academic publications? These publications might be challenged scientifically and the data cited should therefore be kept for evidential reasons. If so the data should be kept for 10 years.
Data management issues
- Backup - Bit level security of the data.
- Fixity - Bit level verification of the data.
- Identity - labelling data as belonging to a data set.
- Format - Use of standardised formats to make data sharing more practical.
- Metadata conventions - Use of standard vocabularies to make machine readable sharing practical.
- Discovery - Creation and provision of metadata to enable data to be found.
- View - Enabling a graphical view of data to aid selection.
- Access - Access to data.
- Licence - Licensing of data to comply with policy and law.
- Access control - Applying access constraints so that authors can have confidence in licence enforcement and to track usage.
- Reporting - Supplying authors and funders with numbers to backup their decisions.
- Context metadata and documentation - Allowing independent use of data.
- Retention - Review and removal of data to comply with policy and law.
- Media migration - ensure data are still readable over time.
- Format migration - ensure data are still usable over time.
- Governance - Agree with suppliers and funders the correct rules to apply to the data.