BADC-CSV Format for Data Exchange

BADC-CSV logo

Contents

Introduction

This page gives a brief outline to the BADC-CSV file format, which is covered in greater depth in the "BADC-CSV Text File Guide for Users and Produces" .
Note - for more details on how to read more generic comma- or -tab- (csv or tsv) separated data please see our ASCII data format page.

link to pdf documentThe guide is available from the CEDA Documentary repository by clicking here .

Format History: An Alternative to NASA-Ames

CEDA has used NASA-Ames formatted data for many years. NASA-Ames was devised primarily as a format for aircraft observations, but can be adapted for many atmospheric observation data. However, NASA-Ames is complex and confusing for users. Users tend to strip the header off and import the text file into Excel. The metadata is generally not used in its machine readable form, but is simply read by the researcher. Also much effort is expended supporting data producers in the creation of NASA-Ames files. The format is seen by producers as complicated and it can't be done simply from spreadsheet packages like Excel. Additionally, the metadata fields offered by NASA-Ames are fixed and inflexible.

Model data stored at CEDA often uses the NetCDF format with CF conventions. This provides a format framework with good flexible metadata. The format can be read by a number of analysis programs including FORTRAN, Python, Matlab and IDL. It is however difficult for a researcher with little technical knowledge to use.

To solve these problems a new file format was developed to bring the advantages from the NetCDF file format into a simple text file. The approach was to use metadata conventions on top of comma separated values files (CSV) as produced by applications like Excel.

BADC-CSV Format: for which type of data?

BADC-CSV has been designed specifically with simple, common data structures in mind often referred to as 1-D data, such as:

  • a collection of surface meteorological data from one or more stations over a period of time
  • a time series of data from one instrument

For more complex data structures - e.g. variation in both time and height or model output - users should seek to use formats such as NetCDF which allow storage and access to such data much more readily than a csv file.

File names

File names for data files held by CEDA (including those in the BADC-CSV format) will adhere to the file naming convention described on the File Name page . In the case of BADC-CSV files the extension should be ".csv".

Record structure

Each file contains one or more data records, each of which is made of two parts:

  1. at the top of the data record starts with header information which include information about he data itself and supplementary information that aids users to understand how the data were produced, iterpret it correctly - collectively known as metadata .
  2. the second part of the record contains the data itself

Each part of the record within the BADC-CSV file has various components. Some of these components are compulsory, while others are advisory. In addition, there is scope for additional information to be added by the file creator within the metadata section. Each part of the data record is discussed in greater depth below, while the various components required to make the file compliant to various standards is discussed in the next section. 

Header or Metadata section
The header includes, in a defined order and format, all the information needed to read and understand the data. Namely:
Conventions,G,BADC-CSV,1 This is the "File Type Identifier" - a compulsory line for all data records. It indicates to any user the file format, as well as following the Climate and Forecasting Metadata Conventions (see  here for more information about CF standards).
[label],[ref],[value,value,value...] This is the form of any subsequent metadata elements. The label is a metadata tag which may be an item from the list of controlled metadata items (see  full documentation for complete list), or one generated by the file producer. Note: items within the controlled list have special meanings and should not be used other than as prescribed. All words within the label should be joined by underscored ("_"), should not have any white space and should be entirely lower case. The [ref] element links the metadata element to the relevant part of the record - a "G" indicated that the metadata tag applied globally to the entire data record, while any other alphanumeric string can be used as a reference between metadata elements and the data itself. All subsequent values associated with the metadata element are given in a comma separated list as indicated here by the [value,value,...].
Data section
This includes two markers to indicate the start and end of the data, as well as the references to link the data to the relevant metadata tags defined above.
data This word always follows on the next line after the end of all the metadata elements. This indicates the start of the data block of the report and must only be in lowercase.
[references] The next line following the line containing "data" contains a comma separated list of all the references to the data that follows, in order that it appears. These references link the data to the relevant metadata tags.
[data lines] The data are now given in the record as comma separated values over as many lines as required.
end data The end of the data block and the record itself is then given by a single line with these words in lowercase

Standards Compliance

The BADC-CSV format has been structured to ensure that, if all recommendations are followed, that the file will be compliant to a number of standards for csv files themselves and various metadata standards. The various levels of compliance are discussed below and data suppliers are strongly encouraged to adhere to the standards below where possible. Further detail on which standards each metadata element conforms to is given in the full documentation available from the link at the top of this page.

Basic

All BADC-CSV files must contain the Conventions metadata line at the top of all data records within the file and all data elements must have a long_name metadata tag. In addition at least one data element must be denoted as a coordinate_variable. These three elements ensure that the file meets the basic requirements of the CF metadata conventions standard.

In addition, the file must conform to the standard conventions for a comma separated variable file where: a line is a single comma separated list ending in a line feed (\r\n); elements containing items such as line feeds, quotation marks or commas should be enclosed within quotation marks. Elements containing quotation marks will require double quotation marks in place of a single quotation mark.. i.e. in place of "word" ""word"" should be written. Such that the line would be, for example:

	long_name,1,"Bill said, ""simply typing words is not enough, as correct syntax is also required!"""

Complete

The creator, source, observation_station, activity, feature_type, location, date_valid, last_revised_date and history controlled metadata elements should be provided to ensure that the BADC-CSV file conforms to the requirements of the CF, Nasa-Ames, ISO19115, Dublin Core, CSML and CEDA's MOLES catalogue standards.

Recommended

It is recommended that where possible other metadata elements from the controlled list (see documentation) should be completed.

Concatenating data records

As each record is clearly defined by the Conventions line at the top and end data line at the bottom it is possible to place additional records into a BADC-CSV file. Each record should be treated independently as it may be extracted from the file for use elsewhere. However, comment lines may be added outside of the records themselves which can be used by file creators to convey file specific information to end users. Such information should be brief and kept at the top of the file.

Examples

A few example files are available below:

Simple Example: This simple example shows a basic BADC-CSV file to demonstrate the layout and also a simple sample of data. This has all the basic required elements in it to be acceptable.

Full Example 1: This is a sample taken from the UK Met Office MetDB dataset showing the AMDARS message type. This shows how metadata for all 35 parameters has been encoded with flagging variables described fully in the metadata section. This was prepared within a spreadsheet before exporting as comma-separated data. It includes not only all required elements for the file to have basic and complete levels of compliance, but shows how additional metadata tags have been added to provide information from the Met Office's metDB system.

Full Example 2: To demonstrate that large numbers of parameters can be accommodated within the file format AND that two or more different types of data can be placed into one file this more complicated example file shows the concatenation of two data records with the second being a different MetDB message type (in this case ship SYNOP message). Comment lines have been added to the top of the file to demonstrate where such additional information should be placed and that this is a free text field. Here the metadata fields are prepared within a spreadsheet and then added to the data files on a daily cbasis when the raw data are ingested into the CEDA archive, demonstrating that such file can easily be constructed by scripts.

Further reading

The BADC-CSV format has been structured to comply with a number of data standards. More information about these standards can be found through these references:

Checking your data files

An interactive facility to check files for compliance with the BADC-CSV format is available on the  BADC-CSV checker page.

Uploading BADC-CSV files to BADC

For programmes currently submitting BADC-CSV formatted files, CEDA provides a Web based  file uploader. In the process, files are checked for compliance with the BADC-CSV standard.

Still need help? Contact Us Contact Us