Compression of large data grids for internet transmission

  • Authors:
  • Paul Wessel

  • Affiliations:
  • Department of Geology and Geophysics, School of Ocean and Earth Science and Technology, University of Hawaii, 1680 East-West Road, Honolulu, HI

  • Venue:
  • Computers & Geosciences
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large gridded data sets useful in the Earth sciences are routinely made available on the Internet. In most cases the data files are provided using 2-byte (16 bit) binary integer values; these are often compressed using general-purpose utilities such as gzip or bzip2 to reduce file sizes prior to transmission over the Internet. However, even files compressed in this way can be hundreds of megabytes, contributing to net congestion and excessive transmission times. This paper presents a command line tool (grdzip) that takes advantage of the structure of gridded data sets to achieve better net compression. The data are first reduced to double differences and packed using a variable width bit-packing algorithm before being written via the general-purpose compression routines in the bzip2 library. The process is fully reversible. Testing of grdzip on eight commonly used gridded data sets reveals that grdzip compresses better and faster than either gzip or bzip2 alone, typically resulting in compressed files that are 25% smaller than bzip2 output and less than 50% of gzip output. Depending on original file size, these savings can be considerable and significantly reduce both transmission time and data server disk space.