Accessing CEDA ElasticSearch Services from the Met Office
These instructions provide a simple overview for querying the CEDA ElasticSearch (ES) service from Met Office Linux servers. There are three stages:
- Create a Python virtual environment and install the "elasticsearch" library (only once)
- Activate the virtual environment (each time you want to query ES)
- Run an example query
Step 1: Create a Python virtual environment and install the "elasticsearch" library (only once)
Create a new working directory and create a virtual environment with:
cd ~/ mkdir es-test cd es-test/ module load scitools python -m venv venv --system-site-packages source venv/bin/activate pip install elasticsearch
Step 2: Activate the virtual environment (each time you want to query ES)
Each time you start a new terminal session, you will need to re-activate the environment:
module load scitools source venv/bin/activate
Step 3: Run an example query
Save the following code to a file called `test_query.py`:
from elasticsearch import Elasticsearch
from elasticsearch import RequestsHttpConnection
# `proxy_pac` in the linux desktop environment
PROXY = "webgate/proxy.pac"
# Create a Connection class that allows you to specify a web-proxy setting
class MyConnection(RequestsHttpConnection):
def __init__(self, *args, **kwargs):
proxies = kwargs.pop('proxies', {})
super(MyConnection, self).__init__(*args, **kwargs)
self.session.proxies = proxies
# Create a test query and run it
def run_query():
es_url = "https://jasmin-es1.ceda.ac.uk"
# Set up the connection and run a query
es = Elasticsearch([es_url], connection_class=MyConnection, proxies = {"http":PROXY})
query = {
"query": {
"match_phrase_prefix": {
"info.directory.analyzed":"/badc/ukmo-nimrod/data/single-site/munduff-hill"
}
},
"aggs": {
"total_vol": {
"sum": {
"field": "info.size"
}
}
}
}
# Run query to get results object
results = es.search(index="ceda-fbi", body=query)
# Get some content from the results
data = results["hits"]["hits"]
# Print some outputs
print(f'Total Results: {results["hits"]["total"]}')
print(f'JSON returned for {len(data)} objects.')
print(f'Example content: {data}')
return data
run_query()
Run the code using:
python test_query.py