Rearranging data to maximize the efficiency of compression
PODS '86 Proceedings of the fifth ACM SIGACT-SIGMOD symposium on Principles of database systems
Using bitmaps for medium sized information retrieval systems
Information Processing and Management: an International Journal
Improved query performance with variant indexes
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Bitmap index design and evaluation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Consecutive retrieval property-revisited
Information Processing Letters
Applying on-line bitmap indexing to reduce counting costs in mining association rules
Information Sciences—Informatics and Computer Science: An International Journal
Space efficient bitmap indexing
Proceedings of the ninth international conference on Information and knowledge management
File organization: the consecutive retrieval property
Communications of the ACM
Simple Bayesian Model for Bitmap Compression
Information Retrieval
Encoded Bitmap Indexing for Data Warehouses
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Model 204 Architecture and Performance
Proceedings of the 2nd International Workshop on High Performance Transaction Systems
Performance Measurements of Compressed Bitmap Indices
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Optimizing Queries on Compressed Bitmaps
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Query processing and optimization in Oracle Rdb
The VLDB Journal — The International Journal on Very Large Data Bases
Compressing Bitmap Indices by Data Reorganization
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Supporting RFID-based item tracking applications in Oracle DBMS using a bitmap datatype
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Optimizing bitmap indices with efficient compression
ACM Transactions on Database Systems (TODS)
Integrating compression and execution in column-oriented database systems
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Sybase IQ multiplex - designed for analytics
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Breaking the Curse of Cardinality on Bitmap Indexes
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Analysis of Basic Data Reordering Techniques
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Histogram-aware sorting for enhanced word-aligned compression in bitmap indexes
Proceedings of the ACM 11th international workshop on Data warehousing and OLAP
Sorting improves word-aligned bitmap indexes
Data & Knowledge Engineering
Analyses of multi-level and multi-component compressed bitmap indexes
ACM Transactions on Database Systems (TODS)
Position list word aligned hybrid: optimizing space and performance for compressed bitmaps
Proceedings of the 13th International Conference on Extending Database Technology
Journal of Computer and System Sciences
Concise: Compressed 'n' Composable Integer Set
Information Processing Letters
NET-FLi: on-the-fly compression, archiving and indexing of streaming network traffic
Proceedings of the VLDB Endowment
Reordering columns for smaller indexes
Information Sciences: an International Journal
Reordering rows for better compression: Beyond the lexicographic order
ACM Transactions on Database Systems (TODS)
Dynamic bitmap index recompression through workload-based optimizations
Proceedings of the 17th International Database Engineering & Applications Symposium
Hi-index | 0.00 |
Sizes of compressed bitmap indexes and compressed data are significantly affected by the order of data records. The optimal orders of rows and columns that minimizes the index sizes is known to be NP-hard to compute. Instead of seeking the precise global optimal ordering, we develop accurate statistical formulas that compute approximate solutions. Since the widely used bitmap indexes are compressed with variants of the run-length encoding (RLE) method, our work concentrates on computing the sizes of bitmap indexes compressed with the basic Run-Length Encoding. The resulting formulas could be used for choosing indexes to build and to use. In this paper, we use the formulas to develop strategies for reordering rows and columns of a data table. We present empirical measurements to show that our formulas are accurate for a wide range of data. Our analysis confirms that the heuristics of sorting columns with low column cardinalities first is indeed effective in reducing the index sizes. We extend the strategy by showing that columns with the same cardinality should be ordered from high skewness to low skewness.