BADC-CSV Format for Data Exchange
- Format History: An Alternative to NASA-Ames
- BADC-CSV Format: for which type of data?
- File names
- Record structure
- Standards Compliance
- Concatenating data records
- Further reading
- Checking your data files
- Uploading BADC-CSV files to BADC
- Reading and understanding BADC-CSV data
This page gives a brief outline to the BADC-CSV file format, which is covered in greater depth in the "BADC-CSV Text File Guide for Users and Produces" .
Note - for more details on how to read more generic comma- or -tab- (csv or tsv) separated data please see our ASCII data format page.
The guide is available from the CEDA Documentary repository by clicking here .
Format History: An Alternative to NASA-Ames
CEDA has used NASA-Ames formatted data for many years. NASA-Ames was devised primarily as a format for aircraft observations, but can be adapted for many atmospheric observation data. However, NASA-Ames is complex and confusing for users. Users tend to strip the header off and import the text file into Excel. The metadata is generally not used in its machine readable form, but is simply read by the researcher. Also much effort is expended supporting data producers in the creation of NASA-Ames files. The format is seen by producers as complicated and it can't be done simply from spreadsheet packages like Excel. Additionally, the metadata fields offered by NASA-Ames are fixed and inflexible.
Model data stored at CEDA often uses the NetCDF format with CF conventions. This provides a format framework with good flexible metadata. The format can be read by a number of analysis programs including FORTRAN, Python, Matlab and IDL. It is however difficult for a researcher with little technical knowledge to use.
To solve these problems a new file format was developed to bring the advantages from the NetCDF file format into a simple text file. The approach was to use metadata conventions on top of comma separated values files (CSV) as produced by applications like Excel.
BADC-CSV Format: for which type of data?
BADC-CSV has been designed specifically with simple, common data structures in mind often referred to as 1-D data, such as:
- a collection of surface meteorological data from one or more stations over a period of time
- a time series of data from one instrument
For more complex data structures - e.g. variation in both time and height or model output - users should seek to use formats such as NetCDF which allow storage and access to such data much more readily than a csv file.
File names for data files held by CEDA (including those in the BADC-CSV format) will adhere to the file naming convention described on the File Name page . In the case of BADC-CSV files the extension should be ".csv".
Each file contains one or more data records, each of which is made of two parts:
- at the top of the data record starts with
header informationwhich include information about he data itself and supplementary information that aids users to understand how the data were produced, iterpret it correctly - collectively known as metadata.
- the second part of the record contains the data itself
Each part of the record within the BADC-CSV file has various components. Some of these components are compulsory, while others are advisory. In addition, there is scope for additional information to be added by the file creator within the metadata section. Each part of the data record is discussed in greater depth below, while the various components required to make the file compliant to various standards is discussed in the next section.
| Header or Metadata section
The header includes, in a defined order and format, all the information needed to read and understand the data. Namely:
| Data section
This includes two markers to indicate the start and end of the data, as well as the references to link the data to the relevant metadata tags defined above.
The BADC-CSV format has been structured to ensure that, if all recommendations are followed, that the file will be compliant to a number of standards for csv files themselves and various metadata standards. The various levels of compliance are discussed below and data suppliers are strongly encouraged to adhere to the standards below where possible. Further detail on which standards each metadata element conforms to is given in the full documentation available from the link at the top of this page.
All BADC-CSV files must contain the
Conventions metadata line at the top of all data records within the file and all data elements must have a
long_name metadata tag. In addition at least one data element must be denoted as a
coordinate_variable. These three elements ensure that the file meets the basic requirements of the CF metadata conventions standard.
In addition, the file must conform to the standard conventions for a comma separated variable file where: a line is a single comma separated list ending in a line feed (\r\n); elements containing items such as line feeds, quotation marks or commas should be enclosed within quotation marks. Elements containing quotation marks will require double quotation marks in place of a single quotation mark.. i.e. in place of
""word"" should be written. Such that the line would be, for example:
long_name,1,"Bill said, ""simply typing words is not enough, as correct syntax is also required!"""
The creator, source, observation_station, activity, feature_type, location, date_valid, last_revised_date and history controlled metadata elements should be provided to ensure that the BADC-CSV file conforms to the requirements of the CF, Nasa-Ames, ISO19115, Dublin Core, CSML and CEDA's MOLES catalogue standards.
It is recommended that where possible other metadata elements from the controlled list (see documentation) should be completed.
Concatenating data records
As each record is clearly defined by the
Conventions line at the top and
end data line at the bottom it is possible to place additional records into a BADC-CSV file. Each record should be treated independently as it may be extracted from the file for use elsewhere. However, comment lines may be added outside of the records themselves which can be used by file creators to convey file specific information to end users. Such information should be brief and kept at the top of the file.
A few example files are available below:
Simple Example: This simple example shows a basic BADC-CSV file to demonstrate the layout and also a simple sample of data. This has all the basic required elements in it to be acceptable.
Full Example 1: This is a sample taken from the UK Met Office MetDB dataset showing the AMDARS message type. This shows how metadata for all 35 parameters has been encoded with flagging variables described fully in the metadata section. This was prepared within a spreadsheet before exporting as comma-separated data. It includes not only all required elements for the file to have basic and complete levels of compliance, but shows how additional metadata tags have been added to provide information from the Met Office's metDB system.
Full Example 2: To demonstrate that large numbers of parameters can be accommodated within the file format AND that two or more different types of data can be placed into one file this more complicated example file shows the concatenation of two data records with the second being a different MetDB message type (in this case ship SYNOP message). Comment lines have been added to the top of the file to demonstrate where such additional information should be placed and that this is a free text field. Here the metadata fields are prepared within a spreadsheet and then added to the data files on a daily cbasis when the raw data are ingested into the CEDA archive, demonstrating that such file can easily be constructed by scripts.
The BADC-CSV format has been structured to comply with a number of data standards. More information about these standards can be found through these references:
- Basic information about metadata
- ^Climate and Forecasting Metadata Conventions (CF)
- ^Nasa Ames metadata conventions
- Dublin Core
- Climate Science Modelling Language (CSML)
- Metadata Objects for Linking Environmental Sciences (MOLES)
Checking your data files
An interactive facility to check files for compliance with the BADC-CSV format is available on the BADC-CSV checker page.
Uploading BADC-CSV files to BADC
For programmes currently submitting BADC-CSV formatted files, CEDA provides a Web based file uploader. In the process, files are checked for compliance with the BADC-CSV standard.