Reading a NetCDF file from a Python Script using OpenDAP
This article is now obsolete, and we recommend that you refer to the Python NetCDF4 subsetting example at the bottom of our Archive Access Tokens article.
This page provides an example of how you can connect remotely to a NetCDF file in the CEDA Archive. This uses the OpenDap protocol that allows you to query and subset a NetCDF file that is stored on a remote server. For other ways to connect to OpenDap please see the scripted interactions page.
- The Python script
- Setting up an environment to run the python script (Python 2.7)
- Setting up an environment to run the python script (Python 3.7 and Jaspy)
- Using the script
- Explanation of the script
- Finding the OpenDAP URL
The Python script
The example script can be found at https://github.com/cedadev/opendap-python-example. Download the repository and follow the instructions to use remote_nc_reader.py
"remote_nc_reader.py", provides an example of using a Python2.7 script to connect to, and read both metadata and data from a file in the CEDA Archive. The example given requires that the user is registered with CEDA and the "setup_credentials()" function is used to download and write the relevant certificate files required by the "netCDF4" library.
Setting up an environment to run the python script (Python 2.7)
Note: This process can be a bit sensitive to the versions of the NetCDF library. We have found the most success when using conda to create the virutal environment and install netcdf4.
The above script requires that a few python packages are installed. To ensure that you can use it you may need to create your own "virtual environment" and install the required packages as follows:
$ virtualenv venv $ source venv/bin/activate $ pip install ContrailOnlineCAClient netCDF4
Once the virtual environment has been created you can re-use it next time you login by simply activating it with:
$ source venv/bin/activate
Setting up an environment to run the python script (Python 3.7 / Jaspy)
To run the script using the Jaspy software environment, a virtual environment which inherits the packages from the Jaspy Conda installation can be used:
$module load jaspy $python -m venv --system-site-packages ~/pyvenv3 $source ~/pyvenv3/bin/activate $pip install ContrailOnlineCAClient
Once the virtual environment has been created you can re-use it next time you login by simply activating it with:
$source ~/pyvenv3/bin/activate
Using the script
In order the use the script you will need to ensure that Python and the required dependencies (see below) are available, and that you have a valid CEDA account with access to the URL that you intend to connect to. The script could be used as follows:
$ export CEDA_USERNAME=some_user $ export CEDA_PASSWORD=some_password $ python remote_nc_reader.py http://dap.ceda.ac.uk/thredds/dodsC/badc/ukcp18/data/marine-sim/skew-trend/rcp85/skewSurgeTrend/latest/skewSurgeTrend_marine-sim_rcp85_trend_2007-2099.nc skewSurgeTrend
The first two lines set the user details that are picked up by the "remote_nc_reader.py" script. The two arguments provided at the command-line are:
- The URL to a remote NetCDF file (on an OpenDAP server).
- The variable ID in that NetCDF file that you want to interrogate.
Explanation of the script
The above script includes a number of stages that deserve some explanation. This section shows each function and explains how it works. The "main()" function is where the script starts:
def main(nc_file_url, var_id): """ Main controller function. :param nc_file_url: URL to a NetCDF4 opendap end-point. :param var_id: Variable ID [String] :return: None """ setup_credentials(force=False) ds = get_nc_dataset(nc_file_url, var_id)
The "main()" function takes two arguments from the command-line: (1) the URL to a remote NetCDF file (on an OpenDAP server) and (2) the variable ID in that NetCDF file that you want to interrogate.
Before the "main()" function attempts to contact the remote OpenDAP server it calls the function "setup_credentials()":
def setup_credentials(force=False): """ Download and create required credentials files. Return True if credentials were set up. Return False is credentials were already set up. :param force: boolean :return: boolean """ # Test for DODS_FILE and only re-get credentials if it doesn't # exist AND `force` is True AND certificate is in-date. if os.path.isfile(DODS_FILE_PATH) and not force and cert_is_valid(CREDENTIALS_FILE_PATH): print('[INFO] Security credentials already set up.') return False onlineca_client = OnlineCaClient() onlineca_client.ca_cert_dir = TRUSTROOTS_DIR # Set up trust roots trustroots = onlineca_client.get_trustroots( TRUSTROOTS_SERVICE, bootstrap=True, write_to_ca_cert_dir=True) # Write certificate credentials file key_pair, certs = onlineca_client.get_certificate( username, password, CERT_SERVICE, pem_out_filepath=CREDENTIALS_FILE_PATH) # Write the dodsrc credentials file write_dods_file_contents() print('[INFO] Security credentials set up.') return True<br>
The "setup_credentials()" function makes sure that the appropriate certificate files have been downloaded and saved to your local $HOME directory. As part of this process, a check is done to ensure that the credentials file contains a valid certificate. This is done using the "cert_is_valid()" function, as follows:
def cert_is_valid(cert_file, min_lifetime=0): """ Returns boolean - True if the certificate is in date. Optional argument min_lifetime is the number of seconds which must remain. :param cert_file: certificate file path. :param min_lifetime: minimum lifetime (seconds) :return: boolean """ try: with open(cert_file) as f: crt_data = f.read() except IOError: return False try: cert = x509.load_pem_x509_certificate(crt_data, default_backend()) except ValueError: return False now = datetime.datetime.now() return (cert.not_valid_before <= now and cert.not_valid_after > now + datetime.timedelta(0, min_lifetime))
The "cert_is_valid()" function loads the existing credentials file (if there is one) and checks it has not expired. It returns "True" if the the certificate is still within the minimum lifetime (seconds) specified by the second argument.
Once the credentials files are all in place and valid then the "main()" function calls the "get_nc_dataset()" function as follows:
def get_nc_dataset(url, var_id): """ Open a remote connection to a NetCDF4 Dataset at `url`. Show information about variable `var_id`. Print metadata / data in the file and return the Dataset object. :param url: URL to a NetCDF OpenDAP end-point. :param var_id: Variable ID in NetCDF file [string] :return: netCDF4 Dataset object """ dataset = Dataset(url) print('\n[INFO] Global attributes:') for attr in dataset.ncattrs(): print('\t{}: {}'.format(attr, dataset.getncattr(attr))) print('\n[INFO] Variables:\n{}'.format(dataset.variables)) print('\n[INFO] Dimensions:\n{}'.format(dataset.dimensions)) print('\n[INFO] Max and min variable: {}'.format(var_id)) variable = dataset.variables[var_id][:] units = dataset.variables[var_id].units print('\tMin: {:.6f} {}; Max: {:.6f} {}'.format(variable.min(), units, variable.max(), units)) return dataset
The "get_nc_dataset()" function is provided to demonstrate that you can interact with a secured NetCDF file on a remote OpenDAP server using the same interface (netCDF4-python) that you would use on a local file. In this example the following information is printed:
- Global attributes
- Variables
- Dimensions
- Min and max of requested variable
The "get_nc_dataset()" function returns the NetCDF4 Dataset python object that you could interrogate further as required.
Finding the OpenDAP URL
To discover the URL to use with the netCDF4 Dataset object above:
- Use the CEDA archive browser to navigate to the dataset you wish to open (for example CRU TS temperature data).
- On the right hand side there is a column of download icons. These are for downloading the data directly, and are not for use with OpenDAP.
- For datasets that support OpenDAP access there will be an additional icon, which looks like cogs.
- Click the cog icon.
- This takes you to a page where you could subset the data and download it.
- However, we are interested in the URL contained in the box marked "Data URL".
- Copy this URL from the box by triple-clicking in the box, clicking the right mouse button on the highlighted text and selecting "Copy".
- You can now paste the URL into your code. It will start with:
http://dap.ceda.ac.uk/thredds/dodsC/