Compressed and tar-ed Data

Moving and storing files can often be made more efficient by compressing the files themselves (e.g. .gz, .bz, .zip), sometimes in addition to "tar" which is used to ball together various files.

This article touches on three of the main file compression types common in the CEDA archives.

compression type file extension description utilities
zip .zip https://support.microsoft.com/en-us/kb/259177
http://www.7-zip.org/
gzip .gz http://www.gzip.org/ 
bzip .bz http://www.7-zip.org/
tar .tar https://www.gnu.org/software/tar/manual/tar.html
https://www.gnu.org/software/tar/

Users will also notice that some files are a combination of compression and archive balls. For example, .tar.gz implies a tar ball that has had gzip compression applies, whilst .gz.tar implies a tar collection of gzipped files.

For archiving purposes, though, CEDA does not generally apply such compressions or balling together of files. Where this has been done it is usually for one of the the following reasons:

  • To make it easier for end users to access and download large numbers of files and/or large individual files. For example, NIMROD rain radar products tend to be gzipped and then tarred together into daily tarballs to avoid users having to deal with many hundreds of files for any given day, reduce the overall number of file objects in the archive and to reduce the overall volume of the data in the archive.
  • For consistency with older data. If a dataset have always been compressed we will generally continue to compress additions to that dataset so that anyone with automated processing of the data is not suddenly disrupted. 
  • To follow an established convention. For example Sentinel data uses zip as part of its SAFE packaging. The sentinel user community has software that is expecting the zipped version. 

Still need help? Contact Us Contact Us