Reordering columns for smaller indexes

Authors:
Daniel Lemire;Owen Kaser
Affiliations:
LICEF, Université du Québec í Montréal (UQAM), 100 Sherbrooke West, Montreal, QC, Canada H2X 3P2;Department of CSAS, University of New Brunswick, 100 Tucker Park Road, Saint John, NB, Canada
Venue:
Information Sciences: an International Journal
Year:
2011

Citing 46
Cited 6

Data Compression in Scientific and Statistical Databases

IEEE Transactions on Software Engineering
Data compression and Gray-code sorting

Information Processing Letters
Multiattribute hashing using Gray codes

SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Using multiset discrimination to solve language processing problems without hashing

Theoretical Computer Science
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Improved query performance with variant indexes

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A Survey of Combinatorial Gray Codes

SIAM Review
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Improving performance of sparse matrix-vector multiplication

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Compression of inverted indexes For fast query evaluation

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Binary Interpolative Coding for Effective Index Compression

Information Retrieval
Block-Oriented Compression Techniques for Large Statistical Databases

IEEE Transactions on Knowledge and Data Engineering
Data Compression Support in Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Hilbert R-tree: An Improved R-tree using Fractals

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Reclustering of High Energy Physics Data

SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
Byte-aligned bitmap compression

DCC '95 Proceedings of the Conference on Data Compression
Inverted Index Compression Using Word-Aligned Binary Codes

Information Retrieval
Fast and accurate traffic matrix measurement using adaptive cardinality counting

Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
The Art of Computer Programming, Volume 4, Fascicle 2: Generating All Tuples and Permutations (Art of Computer Programming)

The Art of Computer Programming, Volume 4, Fascicle 2: Generating All Tuples and Permutations (Art of Computer Programming)
Optimizing bitmap indices with efficient compression

ACM Transactions on Database Systems (TODS)
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Integrating compression and execution in column-oriented database systems

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
How to barter bits for chronons: compression and bandwidth trade offs for database scans

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Mixed-Radix Gray Codes in Lee Metric

IEEE Transactions on Computers
Compressing table data with column dependency

Theoretical Computer Science
Compressing large boolean matrices using reordering techniques

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A comparison of five probabilistic view-size estimation techniques in OLAP

Proceedings of the ACM tenth international workshop on Data warehousing and OLAP
Index compression is good, especially for random access

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Compact Hilbert indices: Space-filling curves for domains with unequal side lengths

Information Processing Letters
The SBC-tree: an index for run-length compressed sequences

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Performance of compressed inverted list caching in search engines

Proceedings of the 17th international conference on World Wide Web
Pruning attribute values from data cubes with diamond dicing

IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
Read-optimized databases, in depth

Proceedings of the VLDB Endowment
The bipancycle-connectivity of the hypercube

Information Sciences: an International Journal
Inverted index compression and query processing with optimized document ordering

Proceedings of the 18th international conference on World wide web
Locality and bounding-box quality of two-dimensional space-filling curves

Computational Geometry: Theory and Applications
Sorting improves word-aligned bitmap indexes

Data & Knowledge Engineering
Efficient index compression in DB2 LUW

Proceedings of the VLDB Endowment
Speeding up queries in column stores: a case for compression

DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Conditional edge-fault-tolerant Hamiltonicity of dual-cubes

Information Sciences: an International Journal
Attribute value reordering for efficient hybrid OLAP

Information Sciences: an International Journal
Run-length encodings (Corresp.)

IEEE Transactions on Information Theory
A hilbert space compression architecture for data warehouse environments

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery

Variable length compression for bitmap indices

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
Colour image coding with matching pursuit in the spatio-frequency domain

ICIAP'11 Proceedings of the 16th international conference on Image analysis and processing: Part I
Reordering rows for better compression: Beyond the lexicographic order

ACM Transactions on Database Systems (TODS)
Processing a trillion cells per mouse click

Proceedings of the VLDB Endowment
Minimizing index size by reordering rows and columns

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Increasing the efficiency of quicksort using a neural network based algorithm selection model

Information Sciences: an International Journal

Quantified Score

Hi-index	0.08

Visualization

Abstract

Column-oriented indexes-such as projection or bitmap indexes-are compressed by run-length encoding to reduce storage and increase speed. Sorting the tables improves compression. On realistic data sets, permuting the columns in the right order before sorting can reduce the number of runs by a factor of two or more. Unfortunately, determining the best column order is NP-hard. For many cases, we prove that the number of runs in table columns is minimized if we sort columns by increasing cardinality. Experimentally, sorting based on Hilbert space-filling curves is poor at minimizing the number of runs.