Partitioned optimization of complex queries

Authors:
Damianos Chatziantoniou;Kenneth A. Ross
Affiliations:
Department of Management Science and Technology, Athens University of Economics and Business, Evelpidon 47A & Lefkados 33, 11362 Athens, Greece;Department of Computer Science, Columbia University, USA
Venue:
Information Systems
Year:
2007

Citing 43
Cited 6

Multiple-query optimization

ACM Transactions on Database Systems (TODS)
Parallel database systems: the future of high performance database systems

Communications of the ACM
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Why decision support fails and how to fix it

ACM SIGMOD Record
Adaptive parallel aggregation algorithms

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Providing better support for a class of decision support queries

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A data model for supporting on-line analytical processing

CIKM '96 Proceedings of the fifth international conference on Information and knowledge management
Cost-based optimization of decision support queries using transient-views

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimization of complex aggregate queries in relational databases

Optimization of complex aggregate queries in relational databases
On optimizing an SQL-like nested query

ACM Transactions on Database Systems (TODS)
Orthogonal optimization of subqueries and aggregation

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Distributed query processing in a relational data base system

SIGMOD '78 Proceedings of the 1978 ACM SIGMOD international conference on management of data
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Database System Concepts

Database System Concepts
Foundations of Databases: The Logical Level

Foundations of Databases: The Logical Level
Database Management Systems

Database Management Systems
Gigascope: high performance network monitoring with an SQL interface

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Optimizing Queries with Aggregate Views

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Efficient OLAP Query Processing in Distributed Data Warehouses

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Complex Aggregation at Multiple Granularities

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Complex Query Decorrelation

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Modeling Multidimensional Databases

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Performing Group-By before Join

Proceedings of the Tenth International Conference on Data Engineering
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
The MD-join: An Operator for Complex OLAP

Proceedings of the 17th International Conference on Data Engineering
Fast Computation of Sparse Datacubes

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Groupwise Processing of Relational Queries

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
nD-SQL: A Multi-Dimensional Language for Interoperability and OLAP

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Of Nests and Trees: A Unified Approach to Processing Queries That Contain Nested Subqueries, Aggregates, and Quantifiers

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Including Group-By in Query Optimization

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Aggregate-Query Processing in Data Warehousing Environments

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Eager Aggregation and Lazy Aggregation

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Processing Queries Over Generalization Hierarchies in a Multidatabase System

VLDB '83 Proceedings of the 9th International Conference on Very Large Data Bases
Querying Multiple Features of Groups in Relational Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A Foundation for Multi-dimensional Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Joining Very Large Data Sets

Proceedings of the International Workshop on Databases in Telecommunications
Generalized MD-Joins: Evaluation and Reduction to SQL

DBTel '01 Proceedings of the VLDB 2001 International Workshop on Databases in Telecommunications II
Reasoning with Aggregation Constraints

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
User Defined Aggregates in Object-Relational Systems

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
On relational support for XML publishing: beyond sorting and tagging

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
WinMagic: subquery elimination using window aggregation

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Fundamentals of Database Systems (5th Edition)

Fundamentals of Database Systems (5th Edition)

Using grouping variables to express complex decision support queries

Data & Knowledge Engineering
Energy-efficient query management scheme for a wireless sensor database system

EURASIP Journal on Wireless Communications and Networking - Special issue on theoretical and algorithmic foundations of wireless ad hoc and sensor networks
Semantic optimization of query transformation in semantic peer-to-peer networks

ICCCI'10 Proceedings of the Second international conference on Computational collective intelligence: technologies and applications - Volume Part III
Semantic optimization of query transformation in a large-scale peer-to-peer network

Neurocomputing
Navigating big data with high-throughput, energy-efficient data partitioning

Proceedings of the 40th Annual International Symposium on Computer Architecture
JovianDATA: a multidimensional database for the cloud

Proceedings of the 17th International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Performing complex analysis on top of massive data stores is essential to most modern enterprises and organizations and requires significant aggregation over different attribute sets (dimensions) of the participating relations. Such queries may take hours or days, a time period unacceptable in most cases. As a result, it is important to study these queries and identify special frequent cases that can be evaluated with specialized algorithms. Understanding complex aggregate queries leads to better execution plans and, consequently, performance. The idea of partitioning is fundamental and central in aggregate queries. This concept can be used to define a class of queries called group queries. The main characteristic of a group query is that it can be evaluated in a partitioned (or groupwise) fashion, i.e. the underlying relation(s) can be partitioned (based on a set of attributes) into disjoint groups and each group can be processed separately, possibly in parallel. For example, a query that performs a complex operation (e.g. joins and/or selections and/or aggregations) within each group is a group query. To express it in SQL, one has to join/correlate several views and/or subqueries on the grouping attributes. A naive plan (where the joins are executed) may be very expensive, even for relatively small base relations. On the other hand, a groupwise evaluation can lead to huge performance gains. We present a syntactic criterion to identify group queries in SQL and show that every group query can be expressed in a way that satisfies this criterion. This work is based on Chatziantoniou and Ross [Querying Multiple Features of Groups in Relational Databases. in: 22nd International Conference on Very Large Databases, VLDB, 1996, pp. 295-306]. The concept of group queries is useful not only in terms of evaluation, but also in terms of analyzing a complex decision support query that aggregates over different sets of attributes. In such a case the query may be decomposable to one or more query components, where each component is a group query. This observation allows parallel execution, multi-query processing and identification of special cases. We present in this paper two algorithms to decompose a complex aggregate query to its group query components. The value of groupwise processing has been recently recognized by the research community and implemented in at least a major commercial system. To be of use however in a relational system, partitioned evaluation has to be modeled as a relational operator. We review three different approaches for such an operator and propose a generalized groupwise operator. We also perform some experiments to show that naive optimization with the new operator incorporated without taking into consideration decompositions to group query components does not always lead to the most efficient plans. An extended syntax is another way to identify special frequent cases and apply efficient algorithms. Having specific operators for common operations contributes to the succinctness and optimizability of certain queries (e.g. datacubes). An extended syntax is presented with emphasis for multi-feature queries, a frequent and practical subclass of group queries that is amenable to specialized evaluation, involving (potentially repeated) selection, grouping and aggregation over the same groups.