Accessing CEDA ElasticSearch Services from the Met Office

These instructions provide a simple overview for querying the CEDA ElasticSearch (ES) service from Met Office Linux servers. There are three stages:

Create a Python virtual environment and install the "elasticsearch" library (only once)
Activate the virtual environment (each time you want to query ES)
Run an example query

Step 1: Create a Python virtual environment and install the "elasticsearch" library (only once)

Create a new working directory and create a virtual environment with:

cd ~/
mkdir es-test
cd es-test/
module load scitools
python -m venv venv --system-site-packages
source venv/bin/activate
pip install elasticsearch

Step 2: Activate the virtual environment (each time you want to query ES)

Each time you start a new terminal session, you will need to re-activate the environment:

module load scitools
source venv/bin/activate

Step 3: Run an example query

Save the following code to a file called `test_query.py`:

from elasticsearch import Elasticsearch

from elasticsearch import RequestsHttpConnection
# `proxy_pac` in the linux desktop environment
PROXY = "webgate/proxy.pac"

# Create a Connection class that allows you to specify a web-proxy setting
class MyConnection(RequestsHttpConnection):

    def __init__(self, *args, **kwargs):
        proxies = kwargs.pop('proxies', {})
        super(MyConnection, self).__init__(*args, **kwargs)
        self.session.proxies = proxies

# Create a test query and run it
def run_query():
    es_url = "https://jasmin-es1.ceda.ac.uk"

    # Set up the connection and run a query
    es = Elasticsearch([es_url], connection_class=MyConnection, proxies = {"http":PROXY})

    query = {
            "query": {
                "match_phrase_prefix": {
                    "info.directory.analyzed":"/badc/ukmo-nimrod/data/single-site/munduff-hill"
                                        }
                    },
            "aggs": {
                "total_vol": {
                              "sum": {
                                      "field": "info.size"
                                      }
                             }
                    }
                }

    # Run query to get results object
    results = es.search(index="ceda-fbi", body=query)

    # Get some content from the results
    data = results["hits"]["hits"]

    # Print some outputs
    print(f'Total Results: {results["hits"]["total"]}')
    print(f'JSON returned for {len(data)} objects.')
    print(f'Example content: {data}')
    return data

run_query()

Run the code using:

python test_query.py