MIDAS Open User Guide
MIDAS Open is one of the most popular datasets in the CEDA Archive, yet also not the easiest to use. The following guide covers a few aspects of this dataset collection to aid general use of the data. It includes various FAQs from our user community.
What is MIDAS Open
MIDAS Open is a collection of observation datasets made available by the Met Office each year under the Open Government Licence. The collection contains data from UK meteorological sites from around the late 19th Century to recent years (upto the end of the previous full year for each release) stored in the Met Office's 'MIDAS' database. MIDAS Open, however, is a series of flat-files for a sub-set of the restricted-access full MIDAS database, taken at a given 'snapshot' in time and prepared for general usage in the way that the files have been formatted and the data split up. You can read more in our news article when these data were first released: UK weather station records now freely available to all: MIDAS Open
How does it compare with the full MIDAS collection?
The full MIDAS database holds meteorological data from a wide range of reporting networks operating both globally and nationally - including marine meteorological observations too. Additionally, it holds data from non-Met Office operated sites (e.g. rain gauges operated by water authorities). The MIDAS Open data are a sub-set of these data for UK, land-based stations operated by the Met Office. Currently this represents approximately 95% of available daily temperature and weather observations, 83% of hourly weather data, and 13% of daily rainfall within the full MIDAS collection. A large proportion of the UK raingauge observing network is operated by other agencies so currently excluded from the Midas-Open set. It does not supersede the full MIDAS collection also archived at CEDA.
Due to licencing restrictions access to the 'fuller' MIDAS collection also held in the CEDA Archives is more restricted, primarily aimed at supporting academic use.
For more information about the fuller MIDAS collection see the MIDAS Dataset Collection and related dataset records in the CEDA Data Catalogue.
Where are the latest data/missing?
Not all sites will give you data for the full time period covered by the collection as MIDAS holds both historical and more recent data, i.e. it includes data from stations that are now closed. Also, there may be other reasons for data gaps/missing data from sites. Finally, 2020 data aren't available in MIDAS Open at present - these will be available in the next release which is due around July this year. You may, however be able to request more recent data via the National Meteorological Library and Archive service. For details see the last MIDAS Meteorological Data: FAQ
How do I find the station/data I need?
How are the data structured?
The data in the MIDAS Open collection are split at various levels to aid use of these vast volumes of data. Essentially, the archive structure is:
- ukmo-midas-open is the top 'collection' level seen in the archive under the '/badc' directory as seen either by the web-download service ( data.ceda.ac.uk), ftp download service (ftp.ceda.ac.uk) or directly on the JASMIN system.
- data|metadata|checksums directory - at this level you can find md5-checksums available for releases following the first release as provided by the Met Office. These can be used to verify downloaded data if needed. The metadata directory contains files used for a 'midas-open' station search map tool [note, we're looking to incorporate this information into our general MIDAS Station search tool in due course]. The data directory takes you down to where the datasets themselves are archived.
- dataset - these folders contain the different dataset splits for MIDAS Open, reflecting how the data are split up within the Met Office's MIDAS database.
- release-version - each year a release of each dataset is made available which incorporates data from the previous completed year denoted by the year and month of release. For example, the 202007 release in July 2020 contains data upto the end of 2019. Additionally, the release will also contain any new and updates data for previous years too - see the associated release details for more information about each change. Within these directories you will also see 00README_catalogue_and_licence.txt files which give you outline information from the associated dataset catalogue page - the link in the file will take you to the dataset's catalogue entry where you can find lots of useful information.
- historic-county - MIDAS uses historic county boundaries for consistency for all years and to future-proof against further boundary alterations. The map of traditional counties may aid users trying to determine which directory may contain their station of interest.
- site - each station in MIDAS received a dedicated MIDAS 'src_id' (source ID) which is used to bring together data from the station as reported within different reporting networks where the site may be designated with various IDs to aid finding data across the different datasets and also to link with the station metadata.
- qc-version - see below about this important distinction which denotes how data are handled within the MIDAS system when quality control checks are run and/or data are alternated manually following receipt into the MIDAS system.
- files - each file is split into yearly files to keep individual downloads small to aid users. The files themselves are formatted as 'BADC-CSV' files - see the BADC-CSV Format for Data Exchange help page for more information on using this format. However, these are essentially comma-seperated-variable files that should readily open in common spreadsheet packages or can be easily opened in text editors and scripts. They contain two main sections - a header section (starting with a 'Conventions' line) and a data section (starting with 'data' line and finishing with an 'end data' line). The header section gives details about the overall file contents (so called 'global' attributes, marked with a G in the second column) and details about each column of data.
What are the 'qc-version' directories all about
When you come to the data directories you'll find the actual files are in sub-folders labelled 'qc-version-#` according to which 'version' of the data line is available in MIDAS. Here the 'qc-version-1' directory denotes the latest state of the data lines which may be original data or may be post-QC-ed data lines - the state of the quality control is denoted by the values corresponding to the _q columns (0 means no QC has run on the data when the file was produced, other values will need to be cross referenced with the documentation). IF the QC processes at the Met Office result in a change in the data line then the original (i.e. the one first received by the Met Office from the site) will be stored in the 'qc-version-0' files.
You can find out more about the QC information in section 5 of the MIDAS User Guide.
Are there any tools to use the data? What about the CEDA WPS MIDAS Tools?
At the moment I'm afraid CEDA do not have additional tools to use the MIDAS Open data themselves, unlike the 'MIDAS Extractor' tools with the CEDA Web Processing Service (WPS) which work only with data in the full MIDAS collection. Those existing tools cannot be adapted for the MIDAS Open data due to the different structures of the two dataset collections.