Querying Compressed Data in Data Warehouses

  • Authors:
  • Anindya Datta;Helen Thomas

  • Affiliations:
  • Georgia Institute of Technology, Atlanta, GA 30332, USA anindya@loochi.mgt.gatech.edu;Georgia Institute of Technology, Atlanta, GA 30332, USA helen@loochi.mgt.gatech.edu

  • Venue:
  • Information Technology and Management
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The large size of most data warehouses (typically hundreds of gigabytes to terabytes) results in non-trivial storage costs and makes compression techniques attractive. For the most part, page-level compression (as opposed to attribute or record level schemes) has been shown to achieve the greatest reductions in storage size for databases. A key issue with such schemes is how to quickly access the data to answer queries, since individual tuple boundaries are lost. In this paper we introduce an approach that aims to maintain the benefits of page-level compression (i.e., large reductions in storage size), while at the same time improving query performance through an efficient signature file indexing scheme. The approach uses an attribute-level signature generation method that exploits the value distribution of each attribute in a data warehouse. We provide an extensive theoretical analysis of this approach in which we compare our approach with a recently proposed indexing technique, encoded bitmapped indexing, along a number of important metrics including query processing, insertion, and storage costs. Results show that our approach is preferred in many situations that are likely to occur in practice. We have also implemented a prototype system which validates our analytical findings.