Using grouping variables to express complex decision support queries

  • Authors:
  • Damianos Chatziantoniou

  • Affiliations:
  • Department of Management Science and Technology, Athens University of Economics and Business, Evelpidon 47A & Lefkados 33, 11362 Athens, Greece

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Performing complex analysis on top of massive data stores is essential to most modern enterprises and organizations and requires simple, flexible and powerful syntactic constructs to express naturally and succinctly complex decision support queries. In addition, these linguistic features have to be coupled by appropriate evaluation and optimization techniques in order to efficiently compute these queries. In this article we review the concept of grouping variable and describe a simple SQL extension to match it. We show that this extension enables the facile expression of a large class of practical data analysis queries. Besides syntactic simplicity, grouping variables can be neatly modeled in relational algebra via a relational operator, called MD-join. MD-join combines joins and group-bys (a frequent case in decision support queries) into one operator, allowing novel evaluation and optimization techniques. By making explicit how joins interact with group bys, we provide the optimizer with enough information to use specific algorithms and employ appropriate optimization plans, not easily detectable previously. Several experiments demonstrate substantial performance improvements, in some cases of one or two orders of magnitude. The work on grouping variables have influenced at least one commercial system and the standardization of ANSI SQL and implementations of it have been studied in the context of telecom applications, medical and bio-informatics, finance and others. Finally, current work studies the potential of grouping variables in formulating decision support queries over streams of data, one of the latest research trends in database community.