Information retrieval
Optimal signature extraction and information loss
ACM Transactions on Database Systems (TODS)
Arithmetic coding for data compression
Communications of the ACM
Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
An overview of data warehousing and OLAP technology
ACM SIGMOD Record
Improved query performance with variant indexes
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Query optimization for selections using bitmaps
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Signature files: an access method for documents and its analytical performance evaluation
ACM Transactions on Information Systems (TOIS)
IEEE Transactions on Knowledge and Data Engineering
Encoded Bitmap Indexing for Data Warehouses
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Relational Database Compression Using Augmented Vector Quantization
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Compressing Relations and Indexes
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Model 204 Architecture and Performance
Proceedings of the 2nd International Workshop on High Performance Transaction Systems
A Novel Index Supporting High Volume Data Warehouse Insertion
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Data Compression Support in Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
GLIMPSE: a tool to search through entire file systems
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Hi-index | 0.00 |
The large size of most data warehouses (typically hundreds of gigabytes to terabytes) results in non-trivial storage costs and makes compression techniques attractive. For the most part, page-level compression (as opposed to attribute or record level schemes) has been shown to achieve the greatest reductions in storage size for databases. A key issue with such schemes is how to quickly access the data to answer queries, since individual tuple boundaries are lost. In this paper we introduce an approach that aims to maintain the benefits of page-level compression (i.e., large reductions in storage size), while at the same time improving query performance through an efficient signature file indexing scheme. The approach uses an attribute-level signature generation method that exploits the value distribution of each attribute in a data warehouse. We provide an extensive theoretical analysis of this approach in which we compare our approach with a recently proposed indexing technique, encoded bitmapped indexing, along a number of important metrics including query processing, insertion, and storage costs. Results show that our approach is preferred in many situations that are likely to occur in practice. We have also implemented a prototype system which validates our analytical findings.