On the equivalence and rewriting of aggregate queries

  • Authors:
  • Stéphane Grumbach;Maurizio Rafanelli;Leonardo Tininini

  • Affiliations:
  • INRIA, BP 105, Rocquencourt, 78153, Le Chesnay, France;CNR-IASI, BP 105, viale Manzoni 30, 00185, Roma, Italy;CNR-IASI, BP 105, viale Manzoni 30, 00185, Roma, Italy

  • Venue:
  • Acta Informatica
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce a first-order language with real polynomial arithmetic and aggregation operators (count, iterated sum and multiply), which is well suited for the definition of aggregate queries involving complex statistical functions. It offers a good trade-off between expressive power and complexity, with a tractable data complexity. Interestingly, some fundamental properties of first-order with real arithmetic are preserved in the presence of aggregates. In particular, there is an effective quantifier elimination for formulae with aggregation. We then consider the problem of querying data that has already been aggregated in aggregate views, and focus on queries with an aggregation over a conjunctive query (namely single-block aggregate group-by queries without having clause). Our main conceptual contribution is the introduction of a new equivalence relation among conjunctive queries, the isomorphism modulo a product. We prove that the equivalence of aggregate queries such as for instance averages reduces to it. Deciding if two queries are isomorphic modulo a product is shown to be NP-complete. We then analyze the equivalence problem in the case of aggregate conjunctive queries with comparisons. We introduce the concept of cross isomorphic linear expansions, which generalizes isomorphim modulo a product, and we show that equivalence reduces to it and that it can be decided in PSPACE. Finally, we show that the problem of complete rewriting of count queries using count views is NP-complete, and we introduce new rewriting techniques based on the isomorphism modulo a product. to recover the values of counts by complex arithmetical computation from the views.