Extending complex ad-hoc OLAP

  • Authors:
  • Theodore Johnson;Damianos Chatziantoniou

  • Affiliations:
  • Database Research Dept., AT&T Labs - Research;Dept. of CS, Stevens Institute of Technology

  • Venue:
  • Proceedings of the eighth international conference on Information and knowledge management
  • Year:
  • 1999

Quantified Score

Hi-index 0.01

Visualization

Abstract

Large scale data analysis and mining activities require sophisticated information extraction queries. Many queries require complex aggregation, and many of these aggregates are non-distributive. Conventional solutions to this problem involve defining User Defined Aggregate Functions (UDAFs). However, the use of UDAFs entails several problems. Defining a new UDAF can be a significant burden for the user, and optimizing queries involving UDAFs is difficult because of the “black box” nature of the UDAF.In this paper, we present a method for expressing nested aggregates in a declarative way. A nested aggregate, which is a rollup of another aggregated value, expresses a wide range of useful non-distributive aggregation. For example, most frequent type aggregation can be naturally expressed using nested aggregation, e.g. “For each product, report its total sales during the month with the largest total sales of the product”. By expressing compex aggregates declaratively, we relieve the user of the burden of defining UDAFs, and allow the evalution of the complex aggregates to be optimized.We use the Extended Multi-Feature (EMF) syntax as the basis for expressing nested aggregation. An advantage of this approach is that EMF SQL can already express a wide range of complex aggregation in a succinct way, and EMF SQL is easily optimized into efficient query plans. We show that nested aggregation queries can be evaluated efficiently by using a small extension to the EMF SQL query evaluation algorithm. A side effect of this extension is to extend EMF SQL to permit complex aggregation of data from multiple sources.