Sending Data To CEDA
CEDA is able to accept data via a number of different routes, depending on the size and complexity of the data being provided. A member of the CEDA team will be assigned to liaise with the data provider to provide advice on data preparation and to help set up the delivery and ingestion route. If you have not already had contact from a member of the CEDA team please contact us to discuss this before proceeding.
For those wishing to send data for ingestion into the CEDA archive a number of steps are first required to set up the delivery and ingestion routes:
- Sign up to the Depositor Agreement - this will ensure that you are granted suitable spaces to deliver your data and that CEDA is authorised to handle the data you wish to deposit. Full conditions of deposit are given as part of the Depositor Agreement application process.
- Choose the most appropriate delivery mechanism for your deposit - your CEDA team liaison will be able to advise on this if you are unsure
- Once the delivery route has been identified and your CEDA team liaison has informed you that this is set up please follow the instructions that they provide. Some notes are given below to aid with this process.
Ingestion Streams - ensuring data gets to the right place
When delivering data to CEDA for ingestion into the archives you will need to place it within a particular "ingestion stream". This will ensure that the data delivered into that stream will be archived into the correct place within the archive. Please do not place any other files into that area, nor place them elsewhere without first agreeing with your CEDA team liaison.
Available Delivery Tools
Data providers can upload data to CEDA using a variety of tools. The data scientist liaising with the provider will be happy to advise on the most appropriate mechanism to use and will help to set the data provision route.
|Delivery Route (click for details)||Types of data transfer is suitable for||For Archiving||For GWS delivery|
|Via HTTP - the CEDA File uploader||Suitable for small scale data providers and short lived projects||y||n|
|FTP||Suitable for small - medium scale data uploads for suitable projects where RSYNC is not an option||y||y|
|RSYNC||Our preferred delivery mechanism which is suitable for all types of data provision. This route is particularly suited to regular, automated data uploading and is especially useful for very large files and dataset transfers.||y||y|
The file uploader service allows single files to be transfered to CEDA for archive ingestion - http://arrivals.ceda.ac.uk/. Please ask your CEDA liaison officer if you need to know which ingest stream to use.
Any ftp client can be used to connect to CEDA's ftp arrivals server:
After logging in with your CEDA account credentials you will arrive at your delivery area which will contain sub-directory for each data "stream" to which you are permitted to deposit data. If you are unable to locate the required sub-directory or are unsure which one to use please contact your CEDA team liaison.
If possible, please use a temporary filename during transfer and rename the file once the file has completed transferring to ensure that we are able to distinguish partially and completely transferred files.
Please note - we ask depositors not to deposit any files in to their top folder, but to always use one of the available sub-folders (Please see the note above about ingestion streams)
Details of how to use ftp are available here (note, the general ftp guide has been written using the download ftp server, as opposed to the upload ftp server).
Once you have your RSYNC server account login (this is different from your normal CEDA account login - available from your CEDA team liaison officer) data can be rsynced as follows:
rsync -av --password-file=<path to password file> <path to source> <ceda account id>@arrivals.ceda.ac.uk::<ceda_account id>/<TARGET_DIR>
rsync -av --password-file=mypasswordfile.txt data_dir email@example.com::fbloggs/upload_dir
- the double colon between the arrivals.ceda.ac.uk URL and the users account ID.
- The option to use a a file to hold the rsync account password is shown above - recommended for routine deliveries, but may not be needed for rsync transfers by hand.
- If the password file approach is used (see note 2) then the password file needs to be set as only user r/w privileges - i.e. it should show as having permissions: "r-w------". Otherwise a "password file must not be other-accessible" error will occur.
- other rsync options can also be used e.g. –a for recursing down through the source
- If you want to rsync contents of a source directory over you need to add a trailing slash ("/") to the source path. No trailing slash will be needed for the target directory path though
Please be aware, however, that RSYNC will carry out a full comparison between the source and destination. Thus, if you wish to send only a few files from your source to update the CEDA archive holdings then care is needed to avoid unnecessarily transferring large parts of the source to the CEDA system
After logging in with your CEDA account credentials you will arrive at your delivery area which will contain sub-directories for each data " ingest stream" to which you are permitted to deposit data. If you are unable to locate the required sub-directory or are unsure which one to use please contact your CEDA team liaison.
Please note - we ask depositors not to deposit any files in to their top folder, but to always use one of the available sub-folders.
Ingest from Group-workspaces/Project-spaces
CEDA supports projects through shared storage spaces such as JASMIN group workspaces or FTP project spaces. Users of these services should understand that:
- this is NOT the archive - placing data into these areas will not constitute having deposited data in the CEDA archive
- the group-workspaces/project-spaces are NOT managed by CEDA and so content should be considered at risk
However, it is possible to prepare a dataset in these areas for eventual ingestion into the archive. If you wish to do this please contact your CEDA support officer in the first instance to discuss ingestion into the archive as it may be possible to ingest directly from these areas.