ASCII Data Formats

The CEDA archives contain many different data formats - including both "binary" and "ASCII" formats (such as "  tsv" or "csv" data). ASCII (American Standard Code for Information Interchange) is a term often used to mean that the data are stored in a human-readable manner, though it has a specific meaning with reference to the permitted characters allowed (i.e.. a-Z, A-Z, 0-9 and a few others).

Whilst CEDA encourages the use of specific formats for encoding data (see the list on our  File Formats Demystified page), which covers the majority of the CEDA archive, there are some datasets (mainly legacy ones) which use other formats often referred to as "ASCII" formats.

Identifying an ASCII formatted file

The simplest way to spot an ASCII file is to try and open it - if you can read the contents then it's a text file. However, it is also worth while checking if this is a bespoke one for that dataset or if it conforms to one of the main archive text file formats such BADC-CSV or NASA-Ames. To help check try the following:

  • Check the dataset's catalogue page for any format information and linked documentation
  • ASCII files often have a .ascii, .dat or .txt file-endings. Though earlier files may follow the earlier 8.3 file-naming convention.
  • BADC-CSV files have a .csv file ending and have Conventions,G,BADC-CSV at the top.
  • NASA-Ames files typically have a .na file ending and a top line consisting of two numbers: the first the number of lines in in the file before the data, the second (a 4 digit nubmer) indicates the type of NASA-Ames file it is.

These two, and others, are detailed further on the  File Formats Demystified page.

ASCII Format Documentation

When CEDA archives data that doesn't use one of our archive standard formats we try to obtain supporting documentation describing that format and add links to this on the relevant dataset pages in the  CEDA data catalogue.  Links to such documents will be found under the "Docs" tab on the relevant dataset's catalogue page. However, should you be unable to find relevant documentation, please contact the CEDA heldpesk for further assistance.

Reading ASCII formatted data

Sometimes software tools may be available to read in the data in the format provided. These will often be linked to from the dataset's catalogue page under the  "Docs" tab or an internet search may reveal tools produced elsewhere and may be stored in a "software" folder within a dataset collection in the archive.

However, this may not always be the case and instead the data may need to be read into a data processing programme, spreadsheet or database. In such cases the documentation should give an indication how to read these data in, or the following two sections may assist. 

Before starting to handle the file, though, it can be helpful to get an idea of what the file looks like by just opening the file with a text editor (for example, Windows - Notepad, Mac - Nano, Linux - Emacs). 

Common ASCII data structure

A common approach to ASCII data is to have two sections to the file:

  • A header section - sometimes described as the metadata section. This may contain extra information either about how the data were collected/produced (e.g. instrument name, settings), some relevant additional information (e.g. who produced the data, where it was produced) or information on how to read the data in or interpret it (e.g. number of columns/rows in the data, headings for the columns, scale factors to use). However, these items may not be labelled to tell you what the values are for - here referring to external documentation is likely to help.
  • Data section - below the header section will be the data themselves, which may be split up using a delimiter such as a comma, tab, "|" or similar character.

Comma (CSV) and Tab (TSV) etc delimited data 

Often ASCII data will be delimited into columns of data to help users to understand how the data should be treated. Common ways of delimiting the data are to use symbols such as comma, tab, semi-colon, "|", space(s) or tab(s).  As such these data can readily be imported into programmes such as spreadsheets simply by using an "import" function for text files and selecting the appropriate delimiter(s). Guides on how to do this in common packages such as Excel, Google Sheets and Access can easily be found by searching for these on the internet.

However, care should be taken with the file as there may be a top (header) section which has been provided which has not been prepared for importing as delimited data in the same way as the data lines themselves - as in the example below. 

Example

One example of an ASCII formatted file not covered by the standard archive formats is the one used in the CRU TS data, which typically looks like the following:

Climatic Research Unit Country File created on Thu  2 Jul 2015 18:16:18 BST, from CRU TS run #1506241137
Country = Afghanistan          : Parameter = Precipitation                  : Units = mm/month            
Period = 1901.2014 : missing value = -999.0 : format = (i5,17f8.1)
 YEAR     JAN     FEB     MAR     APR     MAY     JUN     JUL     AUG     SEP     OCT     NOV     DEC     MAM     JJA     SON     DJF     ANN
 1901    65.4    13.5    46.8    32.5    50.9    20.1     7.8     2.6     3.9    11.5     7.7     7.7   130.1    30.6    23.1    48.6   270.4
 1902    19.8    21.1    41.6    33.8    14.1     5.2     2.5     3.3     1.9    19.8    53.9    19.0    89.5    11.0    75.6   117.9   236.0
 1903    49.0    49.8    68.2    38.1    80.6     7.1     7.4     6.6     6.3     2.0    12.0    21.4   186.8    21.1    20.2   113.8   348.4
 1904    67.8    24.6    77.2    23.6    28.2     0.3     3.0     4.1     8.6    21.6    26.3    20.4   129.0     7.4    56.4   128.8   305.6
 1905    71.3    37.1    71.4    34.4    15.8     4.5     1.1     3.6     5.5     2.4     3.0    43.1   121.7     9.2    10.9   176.8   293.2
 1906    20.8   112.9    56.3    43.2    14.6     6.2     6.7     5.5     2.3     4.7     5.4    34.7   114.1    18.4    12.4   137.8   313.2
 1907    52.0    51.1    40.9    50.8    40.8    15.1     6.4     4.7     3.3    19.0     8.3     9.9   132.5    26.2    30.6   127.5   302.3
 1908    80.1    37.5    55.5    87.1    10.5     3.1     9.8     7.3     6.1     2.8     1.2    45.1   153.2    20.2    10.1   137.4   346.2
 1909    31.5    60.8    52.1    79.4    26.4     8.1    12.8     2.5     2.5     5.9     4.4    47.5   157.9    23.4    12.7   161.3   333.9
 1910    82.2    31.5    53.7    30.0    19.0     4.8    13.5    17.0     0.7     2.5     1.9    22.5   102.7    35.4     5.2   140.2   279.5
 1911    83.3    34.5    91.8    30.2    32.9     1.3     0.4     2.5     0.9     6.3    11.5    21.1   154.9     4.2    18.7   129.5   316.7

Here the header or metadata section is clearly separated from the monthly values by the column headers line. The metadata section also gives useful information such as:

  • who produced the files
  • when it was produced
  • from what source data it was produced
  • which area if covered
  • what the data represented and their units
  • what period the data covered
  • how to identify "missing data" points by a given data value

Below this header section the data lines are then tab delimited

Still need help? Contact Us Contact Us