File Formats Demystified

Introduction

A file format is way to encoded information in a computer file. A format specifies how to interpret the bytes in the files as information with meaning to the programs and people reading and writing them. Each format is designed to carry a particular type of data, but some formats are more specific or more general in their realm of operation.  For example, the PNG format is excellent for encoding an image, but could not be easily used to store a 3D computer aided design model. 

Text formats are those where the bytes in the file should be interpreted as text characters. This means that generic text editors can be used to view or change the data. This is very useful if your data is small and can be interpreted by human inspection. There are different ways to encode text, but most are encoded with ASCII or unicode

Binary formats are those where the bytes have to be interpreted by the specific format rules to work out their meaning. This necessitates the use of specialised programs to read and write the data. 

The CEDA Archive includes a wide range of file formats - some well supported and other historical ones less so. The table below list some of the main formats within the CEDA archives with links to tools supporting the format. For information about which format is used for a dataset please CEDA data catalogue.

Additionally, how information is stored within or about files (so called metadata) is key to how the data within files can be used. See information about  metadata formats in the " Introduction to metadata" article.

Core Supported Formats

Format Type File endings Commonly used for
BADC-CSV text .csv simple "1-D" type of data, e.g. instrument time series data
NASA Ames text .na aircraft and older instrument data (older data may have an  older file-naming convention)
HITRAN text various spectroscopy data
JCAMP-DX text .dx, .jdx only suitable for spectra from spectroscopy experiments
NetCDF binary .nc CEDA's preferred data format.   Model data and observational data with more than 1 dimension (e.g. time-height data).  Suitable for gridded numeric data such as model output. CF conventions preferred - migration to make CF compliant acceptable.
HDF binary .hdf Satellite data. Requires consistent conventions to be followed.
PP binary .pp Met Office model output
GRIB binary .grb ECMWF model output
GEOTIFF image .tif, .TIFF Earth observation imagery data
JPEG2000 image .jp2 Earth observation imagery data
JPEG image .jpg For images
TIFF image .tif, .tiff For images

Other Accepted Formats

A range of other formats have been included to the CEDA Archive over time. Some of these are historical, whilst others are from third party sources which CEDA obtains as a facilitation mode. Not all are listed and the file format information on dataset records in the data catalogue should be referred to.

Format Type File endings Commonly used for Notes
PNG image .png
BUFR


binary meteorology data WMO standard
Nimrod format binary .dat Met Office NIMROD rain radar data To be superseded by ODIMS compliant HDF5
BIL binary .bil flat binary format used by ENVI users - produced by ARSF processing node
LAS point cloud for EO data
PDF(a) .pdf suitable for documentation only
plain text text .txt suitable for documentation only. Data should utilise an approved format instead.
ENVI-HDF

Not Accepted Formats

These formats have been reviewed by CEDA and deemed not acceptable for long-term data archival.

Format Type File endings Alternative format to use
csv/tsv
text csv, .tsv BADC-CSV
Excel text .xls BADC-CSV
HTML text .html BADC-CSV
Word text .doc, .docx BADC-CSV, PDF

Compression and Aggregations

At times it is desirous for files to be compressed to reduce overall volumes and also consider aggregation off files together to aid transfer and storage. These come into play primarily where there are either large numbers of files or large data volumes to consider, though impacts to onward use of the data (to uncompress/unpack) should be considered too.

Note - these should only be applied to files that are already formatted in a permitted format given above.

Format Type File endings Commonly used for
internal compression compression (retains main format ending) Reducing file sizes e.g. HDF5, netCDF
tar aggregation .tar Aggregating a number of files as a "tar ball" allows a set of files to be downloaded together.
gzip, bzip, zip compression .gz, .bz, .zip Reduce the volume required for the file to aid transfer and storage. Note, requires uncompressing before use.
Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.

Still need help? Contact Us Contact Us