A compression technique for large statistical data-bases

  • Authors:
  • Susan J. Eggers;Frank Olken;Arie Shoshani

  • Affiliations:
  • -;-;-

  • Venue:
  • VLDB '81 Proceedings of the seventh international conference on Very Large Data Bases - Volume 7
  • Year:
  • 1981

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we explore the compression of large statistical databases and propose techniques for organizing the compressed data, such that the time required to access the data is logarithmic. Our techniques are variations of run-length encoding, in which modified run-lengths for the series are extracted from the data stream and stored in a header, which is used to form the base level of a B-tree index into the database. The run-lengths are cumulative, and therefore the access time of the data is logarithmic in the size of the header. We discuss the details of the compression scheme and its implementation, present several special cases and give an analysis of the relative performance of the various versions.