Dictionary-based order-preserving string compression

Authors:
Gennady Antoshenkov
Affiliations:
Oracle Corporation, New England Development Center, 110 Spitbrook Road, Nashua, NH 03062, USA/ e-mail: gantoshe@us.oracle.com
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
1997

Citing 6
Cited 8

Improving Quicksort Performance with a Codeword Data Structure

IEEE Transactions on Software Engineering
Text compression

Text compression
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Prefix B-trees

ACM Transactions on Database Systems (TODS)
An encoding method for multifield sorting and indexing

Communications of the ACM
Benchmark Handbook: For Database and Transaction Processing Systems

Benchmark Handbook: For Database and Transaction Processing Systems

Comparative Analysis of XML Compression Technologies

World Wide Web
How to wring a table dry: entropy compression of relations and querying of compressed relations

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Fast string sorting using order-preserving compression

Journal of Experimental Algorithmics (JEA)
XQueC: A query-conscious compressed XML database

ACM Transactions on Internet Technology (TOIT)
How to barter bits for chronons: compression and bandwidth trade offs for database scans

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
XQueC: pushing queries to compressed XML data

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
XML compression techniques: A survey and comparison

Journal of Computer and System Sciences
Efficient index compression in DB2 LUW

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

As no database exists without indexes, no index implementation exists without order-preserving key compression, in particular, without prefix and tail compression. However, despite the great potentials of making indexes smaller and faster, application of general compression methods to ordered data sets has advanced very little. This paper demonstrates that the fast dictionary-based methods can be applied to order-preserving compression almost with the same freedom as in the general case. The proposed new technology has the same speed and a compression rate only marginally lower than the traditional order-indifferent dictionary encoding. Procedures for encoding and generating the encode tables are described covering such order-related features as ordered data set restrictions, sensitivity and insensitivity to a character position, and one-symbol encoding of each frequent trailing character sequence. The experimental results presented demonstrate five-folded compression on real-life data sets and twelve-folded compression on Wisconsin benchmark text fields.