Avoiding sorting and grouping in processing queries

Authors:
Xiaoyu Wang;Mitch Cherniack
Affiliations:
Department of Computer Science, Brandeis University, Waltham, MA;Department of Computer Science, Brandeis University, Waltham, MA
Venue:
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Year:
2003

Citing 9
Cited 11

Fundamental techniques for order optimization

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Database Systems Concepts

Database Systems Concepts
Bringing order to query optimization

ACM SIGMOD Record
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
The Implementation of POSTGRES

IEEE Transactions on Knowledge and Data Engineering
Performing Group-By before Join

Proceedings of the Tenth International Conference on Data Engineering
Hashing Methods and Relational Algebra Operations

VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
Including Group-By in Query Optimization

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Exploiting early sorting and early partitioning for decision support query processing

The VLDB Journal — The International Journal on Very Large Data Bases

MonetDB/XQuery: a fast XQuery processor powered by a relational engine

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A combined framework for grouping and order optimization

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Deciding the physical implementation of ETL workflows

Proceedings of the ACM tenth international workshop on Data warehousing and OLAP
Isolating order semantics in order-sensitive xquery-to-SQL translation

BNCOD'07 Proceedings of the 24th British national conference on Databases
Fast sorting on flash memory sensor nodes

Proceedings of the Fourteenth International Database Engineering & Applications Symposium
Which sort orders are interesting?

The VLDB Journal — The International Journal on Very Large Data Bases
Advanced partitioning techniques for massively distributed computation

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Optimizing data shuffling in data-parallel computation by understanding user-defined functions

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Sort-sharing-aware query processing

The VLDB Journal — The International Journal on Very Large Data Bases
Optimization of analytic window functions

Proceedings of the VLDB Endowment
SCOPE: parallel databases meet MapReduce

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sorting and grouping are amongst the most costly operations performed during query evaluation. System R [6] used simple inference strategies to determine orderings held of intermediate relations to avoid unnecessary sorting, and to influence join plan selection. Since then, others have proposed using integrity constraint information to infer orderings of intermediate query results. However, these proposals do not consider how to avoid grouping operations by inferring groupings, nor do they consider secondary orderings (where records in the same group satisfy some ordering). In this paper, we introduce a formalism for expressing and reasoning about order properties: ordering and grouping constraints that hold of physical representations of relations. In so doing, we can reason about how the relation is ordered or grouped, both in terms of primary and secondary orders. After formally defining order properties, we introduce a plan refinement algorithm that infers order properties for intermediate and final query results on the basis of those known to hold of query inputs, and then exploits these inferences to avoid unnecessary sorting and grouping. We then show empirical results demonstrating the benefits of plan refinement, and show that the overhead that our algorithm adds to query optimization is low.