Range queries in OLAP data cubes

  • Authors:
  • Ching-Tien Ho;Rakesh Agrawal;Nimrod Megiddo;Ramakrishnan Srikant

  • Affiliations:
  • IBM Almaden Research Center, 650 Harry Road, San Jose, CA;IBM Almaden Research Center, 650 Harry Road, San Jose, CA;IBM Almaden Research Center, 650 Harry Road, San Jose, CA;IBM Almaden Research Center, 650 Harry Road, San Jose, CA

  • Venue:
  • SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

A range query applies an aggregation operation over all selected cells of an OLAP data cube where the selection is specified by providing ranges of values for numeric dimensions. We present fast algorithms for range queries for two types of aggregation operations: SUM and MAX. These two operations cover techniques required for most popular aggregation operations, such as those supported by SQL.For range-sum queries, the essential idea is to precompute some auxiliary information (prefix sums) that is used to answer ad hoc queries at run-time. By maintaining auxiliary information which is of the same size as the data cube, all range queries for a given cube can be answered in constant time, irrespective of the size of the sub-cube circumscribed by a query. Alternatively, one can keep auxiliary information which is 1/bd of the size of the d-dimensional data cube. Response to a range query may now require access to some cells of the data cube in addition to the access to the auxiliary information, but the overall time complexity is typically reduced significantly. We also discuss how the precomputed information is incrementally updated by batching updates to the data cube. Finally, we present algorithms for choosing the subset of the data cube dimensions for which the auxiliary information is computed and the blocking factor to use for each such subset.Our approach to answering range-max queries is based on precomputed max over balanced hierarchical tree structures. We use a branch-and-bound-like procedure to speed up the finding of max in a region. We also show that with a branch-and-bound procedure, the average-case complexity is much smaller than the worst-case complexity.