Indexing Sequences of IEEE 754 Double Precision Numbers

  • Authors:
  • Antonio Farina;Alberto Ordonez;Jose R. Parama

  • Affiliations:
  • -;-;-

  • Venue:
  • DCC '12 Proceedings of the 2012 Data Compression Conference
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the last decades, much attention has been paid to the development of succinct data structures to store and/or index text, biological collections, source code, etc. Their success was in most cases due to handling data with a relatively small alphabet size and to typically exploit a rather skewed distribution (text) or simply the repetitiveness within the source data (source code repositories, biological sequences of similar individuals). In this work, we face the problem of dealing with collections of floating point data that typically have a large alphabet (a real number hardly ever repeats twice) and a less biased distribution. We present two solutions to store and index such collections. The first one is based on the well-known inverted index. It consumes space around the size of the original collection, providing appealing search times. The second one uses a wavelet tree, which at the expense of slower search times, obtains slightly better space consumption.