Using Archive Access Tokens
Many datasets in the CEDA archive have specific auditing policies or access restrictions in place which prevent non logged-in or unauthorised users from downloading their data. This can pose a problem for anonymous scripted interactions, such as bulk file downloads (with Wget, Python requests, etc) or data sub-setting (e.g. NetCDF libraries or HTTP range get requests). To solve this problem, we provide access tokens that can be used in scripts to act on behalf of your CEDA account for the purposes of accessing data on the CEDA Archive.
How to Generate an Access Token
- Access Restrictions
- Generating Tokens from the CEDA Services Website
- Generating Tokens with the Token API
Using Access Tokens in Scripts (examples)
- Finding CEDA Archive File URLs
- Using Wget
- Using Curl
- Using Python Requests
- Using Python NetCDF4
- Complete Python Example Script
What is an Access Token?
The kind of access token we issue is something called a Bearer Token (from the OAuth 2.0 protocol). It is an opaque string which can be used in HTTP requests to uniquely identify the owner of the token for the purposes of verifying their identity. It is added to a request as part of the HTTP "Authorization Header".
As a security measure, access tokens that we issue have a limited lifespan.
How to Generate an Access Token
First, you will need to log in to your user account (if you don't already have one, you can Register for a CEDA Account).
Access Restrictions
Tokens cannot be used to access data that your CEDA account doesn't already have access to. Many datasets just require login to access, but some have additional licences that you must first sign-up to before you'll be able to access them. You can find information about the access restrictions for datasets, along with relevant sign-up links, on our Data Catalogue. Note that some datasets have policy agreements that you must first agree to, and you may need to wait before your access is approved by a moderator.
Once you're logged in (and your account has the access permissions you need) you can now generate access tokens to use in scripted interactions. See below for details of the different ways to do this.
Generating Tokens from the CEDA Services Website
The simplest way to generate a token is by using the access token generator on the CEDA services website. Note that, since these tokens will only have a lifespan of 3 days and cannot be refreshed, this method may not be suitable for scripts that are intended to run for a long period of time. For these purposes, see the section about the token API further down.
To generate a new token, click the "Create new access token" button, and enter your CEDA user password and (optionally) a name for the token.
After generating a token, you can copy it to use in scripts by clicking "Copy", or delete unneeded tokens using "Delete". You can also see the expiry date of each of your tokens. You can generate a maximum of 2 active tokens using this method.
Generating Tokens with the Token API
Fresh tokens can also be requested from our Token API. This makes it possible to refresh expired tokens inside of your scripts, avoiding the manual step of generating and copying tokens from the website.
The API endpoint is: https://services-beta.ceda.ac.uk/api/token/create/
This is secured with HTTP basic authentication, requiring you to provide your CEDA login credentials in the header of a POST request.
Below are some examples of scripts that will generate a token from the API.
Using a bash script and Curl:
CEDA_USERNAME='my CEDA username' CEDA_PASSWORD='my CEDA password' # If successful, this will return a JSON response containing the token curl --location --request POST 'https://services-beta.ceda.ac.uk/api/token/create/' --header "Authorization: Basic $(echo -n "${CEDA_USERNAME}:${CEDA_PASSWORD}" | base64)"
From inside a Python script (uses requests):
import json import requests from base64 import b64encode url = "https://services-beta.ceda.ac.uk/api/token/create/" username = "my CEDA username" password = "my CEDA password" token = b64encode(f"{username}:{password}".encode('utf-8')).decode("ascii") headers = { "Authorization": f'Basic {token}', } response = requests.request("POST", url, headers=headers) # If successful, this will return a JSON response containing the token response_data = json.loads(response.text) print(response.text) if response.status_code == 200: token = response_data["access_token"]
Using Access Tokens in Scripts
There are multiple ways to use tokens in scripted interactions. First, you'll need a valid URL to the file in the CEDA archive that you're interested in. Then your access token must be added to the HTTP header using the "Bearer Token Authorization Header" standard. In this section, we've included examples with some common ways to access data.
Finding CEDA Archive File URLs
All of the examples below assume that you have an appropriate URL for a file in the CEDA archive.
Here are some steps you can follow to find a file URL:
- Find a dataset that you wish to access files from in the CEDA Archive Browser (for example, UKMO-midas-open). You can also use the CEDA Data Catalogue to search for files, and then press the "Download" button to be sent to the corresponding location on the Archive Browser.
Down the right hand side of the list of files, there will be a download icon for each file in the dataset:
- Right click this download icon and select "Copy link address" (this may be different depending on what browser you're using).
- Now you can paste that link address into your script, or onto the command line. The links will start with https://dap.ceda.ac.uk/ followed by the path to the file in the archive.
Lastly, make sure that the file you're interested in is either open access (can be downloaded anonymously), or that your CEDA account has permission to access the file (see Access Restrictions).
You can check whether you have permission to access the file by attempting to download it in a browser. If you're able to download the file in a browser (you may need to login first), then any token that you generate should also be able to. The exception to this is if you're using a token that was generated before you had access to the data.
Using Wget
To use an access token with a Wget operation, add the token to the Authorisation Header and include the header with the "--header" command line option.
Below is an example of using the token to download a single file:
wget https://dap.ceda.ac.uk/badc/csip/data/salford-radiometer-1/2005/06/salford-radiometer-1_faccombe_20050624_hpc.nc --header "Authorization: Bearer INSERT_ACCESS_TOKEN_HERE"
You can also use the token with a Wget command downloading multiple files in a directory:
wget -e robots=off --mirror --no-parent -r https://dap.ceda.ac.uk/badc/csip/data/salford-radiometer-1/2005/06/ --header "Authorization: Bearer INSERT_ACCESS_TOKEN_HERE"
Using Curl
Similar to Wget, Curl also supports adding a header using the "-H" command line option:
curl -L -H 'Authorization: Bearer INSERT_ACCESS_TOKEN_HERE' https://dap.ceda.ac.uk/badc/csip/data/salford-radiometer-1/2005/06/salford-radiometer-1_faccombe_20050624_hpc.nc > result
Using Python Requests
The Python requests module allows you to set the Bearer Token Authorization Header before sending a request. The example below downloads a single file:
import requests url = "https://dap.ceda.ac.uk/badc/csip/data/salford-radiometer-1/2005/06/salford-radiometer-1_faccombe_20050624_iwv.nc" token = "INSERT_ACCESS_TOKEN_HERE" headers = { "Authorization": f"Bearer {token}" } response = requests.request("GET", url, headers=headers) print(response.text)
Using Python NetCDF4
Normally, it is possible to create NetCDF Dataset objects directly using an open access file URL from our archive. This avoids having to download the entire file, which may be very large. However, trying to do this with a restricted access URL will fail.
To get around this problem, we can request the file separately with the access token included in the request header, ensuring that our request is a data stream so that the whole file isn't downloaded. Then, that data stream can be passed to the Dataset object at initialisation to create an "in-memory" Dataset (read more about this in the NetCDF4 documentation).
import os import requests from urllib.parse import urlparse url = "https://dap.ceda.ac.uk/badc/csip/data/salford-radiometer-1/2005/06/salford-radiometer-1_faccombe_20050624_iwv.nc" token = "INSERT_ACCESS_TOKEN_HERE" headers = { "Authorization": f"Bearer {token}" } response = requests.request("GET", url, headers=headers, stream=True) filename = os.path.basename(urlparse(url).path) dataset = Dataset(filename, memory=response.content)
Complete Python Example Script
We've written an example Python script which demonstrates the complete process of fetching a token using the Token API (and caching it for later use), then opening a restricted access NetCDF file from a URL as a streamed Dataset.
Feel free to copy and adapt this to suit your workflow. You can find the script at the following link:
https://github.com/cedadev/opendap-python-example/blob/master/remote_nc_with_token.py