Indexing Sequences of IEEE 754 Double Precision Numbers

Authors:
Antonio Farina;Alberto Ordonez;Jose R. Parama
Affiliations:
-;-;-
Venue:
DCC '12 Proceedings of the 2012 Data Compression Conference
Year:
2012

Citing 0
Cited 1

Querying RDF dictionaries in compressed space

ACM SIGAPP Applied Computing Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the last decades, much attention has been paid to the development of succinct data structures to store and/or index text, biological collections, source code, etc. Their success was in most cases due to handling data with a relatively small alphabet size and to typically exploit a rather skewed distribution (text) or simply the repetitiveness within the source data (source code repositories, biological sequences of similar individuals). In this work, we face the problem of dealing with collections of floating point data that typically have a large alphabet (a real number hardly ever repeats twice) and a less biased distribution. We present two solutions to store and index such collections. The first one is based on the well-known inverted index. It consumes space around the size of the original collection, providing appealing search times. The second one uses a wavelet tree, which at the expense of slower search times, obtains slightly better space consumption.