Reading a NetCDF file from a Python Script using OpenDAP

This page provides an example of how you can connect remotely to a NetCDF file in the CEDA Archive. This uses the OpenDap protocol that allows you to query and subset a NetCDF file that is stored on a remote server. For other ways to connect to OpenDap please see the scripted interactions page.

This page has four sections:

  • The Python script
  • Using the script
  • Explanation of the script
  • Setting up an environment to run the python script

The Python script

The following script, "remote_nc_reader.py", provides an example of using a Python2.7 script to connect to, and read both metadata and data from a file in the CEDA Archive. The example given requires that the user is registered with CEDA and the "setup_credentials()" function is used to download and write the relevant certificate files required by the "netCDF4" library.

#!/usr/bin/env python

"""
remote_nc_reader.py
===================

Python script for reading a NetCDF file remotely from the CEDA archive.

Pre-requisites:

 - Python2.7
 - Python libraries (installed by Pip):

```
ContrailOnlineCAClient
requests
pip==9.0.3
netCDF4
cryptography
```

Usage:

```
$ python remote_nc_reader.py <url> <var_id>
```

Example:

```
$ URL_DIR=http://data.ceda.ac.uk/badc/ukcp18/data/marine-sim/skew-trend/rcp85/skewSurgeTrend/latest/
$ FILE_NAME=skewSurgeTrend_marine-sim_rcp85_trend_2007-2099.nc
$ URL=${URL_DIR}/${FILE_NAME}
$ VAR_ID=skewSurgeTrend

$ python remote_nc_reader.py $URL $VAR_ID
```

"""

# Import standard libraries
import os
import sys
import datetime

# Get CEDA username and password from environment variables
username = os.environ['CEDA_USERNAME']
password = os.environ['CEDA_PASSWORD']

# Import third-party libraries
from cryptography import x509
from cryptography.hazmat.backends import default_backend

from contrail.security.onlineca.client import OnlineCaClient
from netCDF4 import Dataset


# Credentials defaults
DODS_FILE_CONTENTS = """HTTP.COOKIEJAR=./dods_cookies
HTTP.SSL.CERTIFICATE=./credentials.pem
HTTP.SSL.KEY=./credentials.pem
HTTP.SSL.CAPATH=./ca-trustroots
"""

DODS_FILE_PATH = os.path.expanduser('~/.dodsrc')
CERTS_DIR = os.path.expanduser('~/.certs')

if not os.path.isdir(CERTS_DIR):
    os.makedirs(CERTS_DIR)

TRUSTROOTS_DIR = os.path.join(CERTS_DIR, 'ca-trustroots')
CREDENTIALS_FILE_PATH = os.path.join(CERTS_DIR, 'credentials.pem')

TRUSTROOTS_SERVICE = 'https://slcs.ceda.ac.uk/onlineca/trustroots/'
CERT_SERVICE = 'https://slcs.ceda.ac.uk/onlineca/certificate/'


def cert_is_valid(cert_file, min_lifetime=0):
    """
    Returns boolean - True if the certificate is in date.
    Optional argument min_lifetime is the number of seconds
    which must remain.

    :param cert_file: certificate file path.
    :param min_lifetime: minimum lifetime (seconds)
    :return: boolean
    """
    try:
        with open(cert_file) as f:
            crt_data = f.read()
    except IOError:
        return False

    try:
        cert = x509.load_pem_x509_certificate(crt_data, default_backend())
    except ValueError:
        return False

    now = datetime.datetime.now()

    return (cert.not_valid_before <= now
            and cert.not_valid_after > now + datetime.timedelta(0, min_lifetime))


def setup_credentials(force=False):
    """
    Download and create required credentials files.

    Return True if credentials were set up.
    Return False is credentials were already set up.

    :param force: boolean
    :return: boolean
    """
    # Test for DODS_FILE and only re-get credentials if it doesn't
    # exist AND `force` is True AND certificate is in-date.
    if os.path.isfile(DODS_FILE_PATH) and not force and cert_is_valid(CREDENTIALS_FILE_PATH):
        print('[INFO] Security credentials already set up.')
        return False

    onlineca_client = OnlineCaClient()
    onlineca_client.ca_cert_dir = TRUSTROOTS_DIR

    # Set up trust roots
    trustroots = onlineca_client.get_trustroots(
        TRUSTROOTS_SERVICE,
        bootstrap=True,
        write_to_ca_cert_dir=True)

    # Write certificate credentials file
    key_pair, certs = onlineca_client.get_certificate(
        username,
        password,
        CERT_SERVICE,
        pem_out_filepath=CREDENTIALS_FILE_PATH)

    # Write the dodsrc credentials file
    with open(DODS_FILE_PATH, 'w') as dods_file:
        dods_file.write(DODS_FILE_CONTENTS)

    print('[INFO] Security credentials set up.')
    return True


def get_nc_dataset(url, var_id):
    """
    Open a remote connection to a NetCDF4 Dataset at `url`.
    Show information about variable `var_id`.
    Print metadata / data in the file and return the Dataset object.

    :param url: URL to a NetCDF OpenDAP end-point.
    :param var_id: Variable ID in NetCDF file [string]
    :return: netCDF4 Dataset object
    """
    dataset = Dataset(url)

    print('\n[INFO] Global attributes:')
    for attr in dataset.ncattrs():
        print('\t{}: {}'.format(attr, dataset.getncattr(attr)))

    print('\n[INFO] Variables:\n{}'.format(dataset.variables))
    print('\n[INFO] Dimensions:\n{}'.format(dataset.dimensions))

    print('\n[INFO] Max and min variable: {}'.format(var_id))
    variable = dataset.variables[var_id][:]
    units = dataset.variables[var_id].units
    print('\tMin: {:.6f} {}; Max: {:.6f} {}'.format(variable.min(), units, variable.max(), units))

    return dataset


def main(nc_file_url, var_id):
    """
    Main controller function.

    :param nc_file_url: URL to a NetCDF4 opendap end-point.
    :param var_id: Variable ID [String]
    :return: None
    """
    setup_credentials(force=False)
    ds = get_nc_dataset(nc_file_url, var_id)


if __name__ == '__main__':

    args = sys.argv[1:3]
    main(*args)

Using the script

In order the use the script you will need to ensure that Python and the required dependencies (see below) are available, and that you have a valid CEDA account with access to the URL that you intend to connect to. The script could be used as follows:

$ export CEDA_USERNAME=some_user

$ export CEDA_PASSWORD=some_password 

$ python remote_nc_reader.py http://data.ceda.ac.uk/badc/ukcp18/data/marine-sim/skew-trend/rcp85/skewSurgeTrend/latest/skewSurgeTrend_marine-sim_rcp85_trend_2007-2099.nc skewSurgeTrend

The first two lines set the user details that are picked up by the "remote_nc_reader.py" script. The two arguments provided at the command-line are:

  1. The URL to a remote NetCDF file (on an OpenDAP server).
  2. The variable ID in that NetCDF file that you want to interrogate. 

Explanation of the script

The above script includes a number of stages that deserve some explanation. This section shows each function and explains how it works. The "main()" function is where the script starts:

def main(nc_file_url, var_id):
    """
    Main controller function.

    :param nc_file_url: URL to a NetCDF4 opendap end-point.
    :param var_id: Variable ID [String]
    :return: None
    """
    setup_credentials(force=False)
    ds = get_nc_dataset(nc_file_url, var_id)
	

The "main()" function takes two arguments from the command-line: (1) the URL to a remote NetCDF file (on an OpenDAP server) and (2) the variable ID in that NetCDF file that you want to interrogate. 

Before the "main()" function attempts to contact the remote OpenDAP server it calls the function "setup_credentials()":

def setup_credentials(force=False):
    """
    Download and create required credentials files.

    Return True if credentials were set up.
    Return False is credentials were already set up.

    :param force: boolean
    :return: boolean
    """
    # Test for DODS_FILE and only re-get credentials if it doesn't
    # exist AND `force` is True AND certificate is in-date.
    if os.path.isfile(DODS_FILE_PATH) and not force and cert_is_valid(CREDENTIALS_FILE_PATH):
        print('[INFO] Security credentials already set up.')
        return False

    onlineca_client = OnlineCaClient()
    onlineca_client.ca_cert_dir = TRUSTROOTS_DIR

    # Set up trust roots
    trustroots = onlineca_client.get_trustroots(
        TRUSTROOTS_SERVICE,
        bootstrap=True,
        write_to_ca_cert_dir=True)

    # Write certificate credentials file
    key_pair, certs = onlineca_client.get_certificate(
        username,
        password,
        CERT_SERVICE,
        pem_out_filepath=CREDENTIALS_FILE_PATH)

    # Write the dodsrc credentials file
    with open(DODS_FILE_PATH, 'w') as dods_file:
        dods_file.write(DODS_FILE_CONTENTS)

    print('[INFO] Security credentials set up.')
    return True<br>

The "setup_credentials()" function makes sure that the appropriate certificate files have been downloaded and saved to your local $HOME directory. As part of this process, a check is done to ensure that the credentials file contains a valid certificate. This is done using the "cert_is_valid()" function, as follows:

def cert_is_valid(cert_file, min_lifetime=0):
    """
    Returns boolean - True if the certificate is in date.
    Optional argument min_lifetime is the number of seconds
    which must remain.

    :param cert_file: certificate file path.
    :param min_lifetime: minimum lifetime (seconds)
    :return: boolean
    """
    try:
        with open(cert_file) as f:
            crt_data = f.read()
    except IOError:
        return False

    try:
        cert = x509.load_pem_x509_certificate(crt_data, default_backend())
    except ValueError:
        return False

    now = datetime.datetime.now()

    return (cert.not_valid_before <= now
            and cert.not_valid_after > now + datetime.timedelta(0, min_lifetime))

The "cert_is_valid()" function loads the existing credentials file (if there is one) and checks it has not expired. It returns "True" if the the certificate is still within the minimum lifetime (seconds) specified by the second argument.

Once the credentials files are all in place and valid then the "main()" function calls the "get_nc_dataset()" function as follows:

def get_nc_dataset(url, var_id):
    """
    Open a remote connection to a NetCDF4 Dataset at `url`.
    Show information about variable `var_id`.
    Print metadata / data in the file and return the Dataset object.

    :param url: URL to a NetCDF OpenDAP end-point.
    :param var_id: Variable ID in NetCDF file [string]
    :return: netCDF4 Dataset object
    """
    dataset = Dataset(url)

    print('\n[INFO] Global attributes:')
    for attr in dataset.ncattrs():
        print('\t{}: {}'.format(attr, dataset.getncattr(attr)))

    print('\n[INFO] Variables:\n{}'.format(dataset.variables))
    print('\n[INFO] Dimensions:\n{}'.format(dataset.dimensions))

    print('\n[INFO] Max and min variable: {}'.format(var_id))
    variable = dataset.variables[var_id][:]
    units = dataset.variables[var_id].units
    print('\tMin: {:.6f} {}; Max: {:.6f} {}'.format(variable.min(), units, variable.max(), units))

    return dataset

The "get_nc_dataset()" function is provided to demonstrate that you can interact with a secured NetCDF file on a remote OpenDAP server using the same interface (netCDF4-python) that you would use on a local file. In this example the following information is printed:

  • Global attributes
  • Variables
  • Dimensions
  • Min and max of requested variable

The "get_nc_dataset()" function returns the NetCDF4 Dataset python object that you could interrogate further as required.

Setting up an environment to run the python script

The above script requires that a few python packages are installed. To ensure that you can use it you may need to create your own "virtual environment" and install the required packages as follows:

$ virtualenv venv
$ source venv/bin/activate
$ pip install ContrailOnlineCAClient requests pip==9.0.3 netCDF4 cryptography

Once the virtual environment has been created you can re-use it next time you login by simply activating it with:

$ source venv/bin/activate

Still need help? Contact Us Contact Us