Remove Data Procedure
Outline
Data sometime come to the end of usefulness. E.g. they are depreciated or need to be removed.
Note - for empty directories which were created in anticipation of data, but data were not delivered please see the additional note at the end of this page.
Purpose
This process outlines the necessary steps to carry out when removing a dataset from the archive in terms of :
- Pre-removal checks
- Updating the associated MOLES Catalogue record
- Removing content from the archive
Pre-removal checks
- Is the dataset a popular one? May need to inform users ahead of the removal
- What happens to back-up copies
Catalogue Record
- Don't delete the record!
- Update the dataset record so that:
- the Publication status = Removed
- Data Status is one of the "old" status - with the exception of Historical Archive - that has a specific meaning
- Add a Removed Data date
- Add a "Reason for Data Removal" in the box next to the Lineage statement (this will be added with the date of removal to the full Lineage statement in the user view and on record export should that happen)
- Add a news item if necessary
- If possible add a link to a relevant dataset. This can be done as follows
- If superceded by (i.e. a newer version) another dataset - on the Superceding dataset record use the "related observation" to indicate that this newer record supercedes the one associated with the removed data
- If external either consider creating an external dataset record and showing the appropriate relationship with this record OR (this is a stronger, more obvious link and has specific meaning!)
- simply add in a link to the other records as an online resource
- Issue news item/email to users
Removing Archive Content
- leave the directory associated with the "data path" of the MOLES Catalogue record in place - this will act as a placeholder in case users use a link that bypasses the catalogue. A script will keep the 00README_catalogue_and_licence.txt file up to date with a note that the data have been removed and why
Coping with empty directories (non-delivery of data)
Where directories have been created in the archive for data that then fail to materialise, and no published MOLES record exists (otherwise follow above procedure), the workflow is to:
- Ensure that a data product entry is created for the associated DMP entry in Datamad
- Mark that as not delivered adding a reason for non-delivery if possible. If the reason is not known (e.g. clearing up historic archive) also state this. It is best to record definitively what is and what is not known to avoid future questions.
- Remove archive directory using the appropriate tool
- If removal of archive directory other than the standard tools on Ingest machines (e.g. via loss or via cedaarchiveapp) then follow up work will be required to also get content removed from the FBI and DBI. Contact the Elasticsearch developer for assistance if required.
- If a MOLES record was created review the content and migrate as needed. For example, if a paper was produced instead of data then ensure the reference is recorded on an appropriate Project record/Collection record if possible.
Moved data
Where data are to be moved to a new location in the archive the workflow is:
- Re-ingest a copy of the data using the Deposit Client based tool as appropriate to the new location
- Update associated catalogue record's Result object's data path
- Create a new Result object to record the old data path and link to the dataset's Result object as 'old data path'
- Run necessary FBI re-scanning on old data paths to ensure that old content is removed from FBI/DBI listing (if directories don't disappear contact the person responsible for the FBI/DBI indexes in Elasticsearch and ask them to update as needed)
- Once you are happy that the process has been completed remove the old data
NOTE - take care with regards to filesets. If the old path contains one or more filesets in their entity discuss this with the archive manager before proceeding