Strategies for processing ad hoc queries on large data warehouses

  • Authors:
  • Kurt Stockinger;Kesheng Wu;Arie Shoshani

  • Affiliations:
  • CERN, Geneva, Switzerland;Lawrence Berkeley Nat'l Lab, Berkeley, CA;Lawrence Berkeley Nat'l Lab, Berkeley, CA

  • Venue:
  • Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

As data warehousing applications grow in size, existing data organizations and access strategies, such as relational tables and B-tree indexes, are becoming increasingly ineffective. The two primary reasons for this are that these datasets involve many attributes and the queries on the data usually involve conditions on small subsets of the attributes. Two strategies are known to address these difficulties well, namely vertical partitioning and bitmap indexes. In this paper, we summarize our experience of implementing a number of bitmap index schemes on vertically partitioned data tables. One important observation is that simply scanning the vertically partitioned data tables is often more efficient than using B-tree based indexes to answer ad hoc range queries on static datasets. For these range queries, compressed bitmap indexes are in most cases more efficient than scanning vertically partitioned tables. We evaluate the performance of two different compression schemes for bitmap indexes stored is various ways. Using the compression scheme called Word-Aligned Hybrid Code (WAH) to store the bitmaps in plain files shows the best overall performance for bitmap indexes. Tests indicate that our bitmap index strategy based on WAH is not only efficient for attributes of low cardinality, say,