Accessing CEDA ElasticSearch Services from the Met Office
These instructions provide a simple overview for querying the CEDA ElasticSearch (ES) service from Met Office Linux servers. There are three stages:
- Create a Python virtual environment and install the "elasticsearch" library (only once)
- Activate the virtual environment (each time you want to query ES)
- Run an example query
Step 1: Create a Python virtual environment and install the "elasticsearch" library (only once)
Create a new working directory and create a virtual environment with:
cd ~/ mkdir es-test cd es-test/ module load scitools python -m venv venv --system-site-packages source venv/bin/activate pip install elasticsearch
Step 2: Activate the virtual environment (each time you want to query ES)
Each time you start a new terminal session, you will need to re-activate the environment:
module load scitools source venv/bin/activate
Step 3: Run an example query
Save the following code to a file called `test_query.py`:
from elasticsearch import Elasticsearch from elasticsearch import RequestsHttpConnection # `proxy_pac` in the linux desktop environment PROXY = "webgate/proxy.pac" # Create a Connection class that allows you to specify a web-proxy setting class MyConnection(RequestsHttpConnection): def __init__(self, *args, **kwargs): proxies = kwargs.pop('proxies', {}) super(MyConnection, self).__init__(*args, **kwargs) self.session.proxies = proxies # Create a test query and run it def run_query(): es_url = "https://jasmin-es1.ceda.ac.uk" # Set up the connection and run a query es = Elasticsearch([es_url], connection_class=MyConnection, proxies = {"http":PROXY}) query = { "query": { "match_phrase_prefix": { "info.directory.analyzed":"/badc/ukmo-nimrod/data/single-site/munduff-hill" } }, "aggs": { "total_vol": { "sum": { "field": "info.size" } } } } # Run query to get results object results = es.search(index="ceda-fbi", body=query) # Get some content from the results data = results["hits"]["hits"] # Print some outputs print(f'Total Results: {results["hits"]["total"]}') print(f'JSON returned for {len(data)} objects.') print(f'Example content: {data}') return data run_query()
Run the code using:
python test_query.py