Querying Compressed Data in Data Warehouses

Authors:
Anindya Datta;Helen Thomas
Affiliations:
Georgia Institute of Technology, Atlanta, GA 30332, USA anindya@loochi.mgt.gatech.edu;Georgia Institute of Technology, Atlanta, GA 30332, USA helen@loochi.mgt.gatech.edu
Venue:
Information Technology and Management
Year:
2002

Citing 18
Cited 0

Signature files

Information retrieval
Optimal signature extraction and information loss

ACM Transactions on Database Systems (TODS)
Arithmetic coding for data compression

Communications of the ACM
Balancing histogram optimality and practicality for query result size estimation

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
Improved query performance with variant indexes

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Query optimization for selections using bitmaps

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Signature files: an access method for documents and its analytical performance evaluation

ACM Transactions on Information Systems (TOIS)
Frame-Sliced Signature Files

IEEE Transactions on Knowledge and Data Engineering
Encoded Bitmap Indexing for Data Warehouses

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Relational Database Compression Using Augmented Vector Quantization

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Compressing Relations and Indexes

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Model 204 Architecture and Performance

Proceedings of the 2nd International Workshop on High Performance Transaction Systems
A Novel Index Supporting High Volume Data Warehouse Insertion

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Data Compression Support in Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
GLIMPSE: a tool to search through entire file systems

WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

The large size of most data warehouses (typically hundreds of gigabytes to terabytes) results in non-trivial storage costs and makes compression techniques attractive. For the most part, page-level compression (as opposed to attribute or record level schemes) has been shown to achieve the greatest reductions in storage size for databases. A key issue with such schemes is how to quickly access the data to answer queries, since individual tuple boundaries are lost. In this paper we introduce an approach that aims to maintain the benefits of page-level compression (i.e., large reductions in storage size), while at the same time improving query performance through an efficient signature file indexing scheme. The approach uses an attribute-level signature generation method that exploits the value distribution of each attribute in a data warehouse. We provide an extensive theoretical analysis of this approach in which we compare our approach with a recently proposed indexing technique, encoded bitmapped indexing, along a number of important metrics including query processing, insertion, and storage costs. Results show that our approach is preferred in many situations that are likely to occur in practice. We have also implemented a prototype system which validates our analytical findings.