Exploiting early sorting and early partitioning for decision support query processing

Authors:
J. Claussen;A. Kemper;D. Kossmann;C. Wiesner
Affiliations:
Universitä/t Passau, Lehrstuhl fü/r Informatik, 94030 Passau, Germany/ E-mail: {claussen,kemper,kossmann,wiesner}@db.fmi.uni-passau.de;Universitä/t Passau, Lehrstuhl fü/r Informatik, 94030 Passau, Germany/ E-mail: {claussen,kemper,kossmann,wiesner}@db.fmi.uni-passau.de;Universitä/t Passau, Lehrstuhl fü/r Informatik, 94030 Passau, Germany/ E-mail: {claussen,kemper,kossmann,wiesner}@db.fmi.uni-passau.de;Universitä/t Passau, Lehrstuhl fü/r Informatik, 94030 Passau, Germany/ E-mail: {claussen,kemper,kossmann,wiesner}@db.fmi.uni-passau.de
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2000

Citing 31
Cited 10

Join processing in database systems with large main memories

ACM Transactions on Database Systems (TODS)
The EXODUS optimizer generator

SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Grammar-like functional rules for representing query optimization alternatives

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Join processing in relational databases

ACM Computing Surveys (CSUR)
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
TID hash joins

CIKM '94 Proceedings of the third international conference on Information and knowledge management
Multi-table joins through bitmapped join indices

ACM SIGMOD Record
Fundamental techniques for order optimization

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
On saying “Enough already!” in SQL

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Simultaneous optimization and evaluation of multiple dimensional queries

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Bitmap index design and evaluation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Duplicate record elimination in large data files

ACM Transactions on Database Systems (TODS)
Iterative dynamic programming: a new class of query optimization algorithms

ACM Transactions on Database Systems (TODS)
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Performing Group-By before Join

Proceedings of the Tenth International Conference on Data Engineering
Sort-Merge-Join: An Idea Whose Time Has(h) Passed?

Proceedings of the Tenth International Conference on Data Engineering
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Hash Joins and Hash Teams in Microsoft SQL Server

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Diag-Join: An Opportunistic Join Algorithm for 1:N Relationships

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Hashing Methods and Relational Algebra Operations

VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
Evaluating Functional Joins Along Nested Reference Sets in Object-Relational and Object-Oriented Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Generalised Hash Teams for Join and Group-by

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Memory-Contention Responsive Hash Joins

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Including Group-By in Query Optimization

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Eager Aggregation and Lazy Aggregation

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Optimization of Queries with User-defined Predicates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Query Evaluation in CROQUE - Calculus and Algebra Coincide

BNCOD 15 Proceedings of the 15th British National Conferenc on Databases: Advances in Databases
Functional-join processing

The VLDB Journal — The International Journal on Very Large Data Bases
Fast joins using join indices

The VLDB Journal — The International Journal on Very Large Data Bases
Sing the truth about ad hoc join costs

The VLDB Journal — The International Journal on Very Large Data Bases

Hyperqueries: Dynamic Distributed Query Processing on the Internet

Proceedings of the 27th International Conference on Very Large Data Bases
Building Scalable Electronic Market Places Using HyperQuery-Based Distributed Query Processing

World Wide Web
Supporting ad-hoc ranking aggregates

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
AQuery: query language for ordered data, optimization techniques, and experiments

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Avoiding sorting and grouping in processing queries

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
ARCube: supporting ranking aggregate queries in partially materialized data cubes

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Ordering, distinctness, aggregation, partitioning and DQP optimization in sybase ASE 15

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Data mining techniques in materialised project and selection view

PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
Which sort orders are interesting?

The VLDB Journal — The International Journal on Very Large Data Bases
Massively parallel sort-merge joins in main memory multi-core database systems

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Decision support queries typically involve several joins, a grouping with aggregation, and/or sorting of the result tuples. We propose two new classes of query evaluation algorithms that can be used to speed up the execution of such queries. The algorithms are based on (1) early sorting and (2) early partitioning– or a combination of both. The idea is to push the sorting and/or the partitioning to the leaves, i.e., the base relations, of the query evaluation plans (QEPs) and thereby avoid sorting or partitioning large intermediate results generated by the joins. Both early sorting and early partitioning are used in combination with hash-based algorithms for evaluating the join(s) and the grouping. To enable early sorting, the sort order generated at an early stage of the QEP is retained through an arbitrary number of so-called order-preserving hash joins. To make early partitioning applicable to a large class of decision support queries, we generalize the so-called hash teams proposed by Graefe et al. [GBC98]. Hash teams allow to perform several hash-based operations (join and grouping) on the same attribute in one pass without repartitioning intermediate results. Our generalization consists of indirectly partitioning the input data. Indirect partitioning means partitioning the input data on an attribute that is not directly needed for the next hash-based operation, and it involves the construction of bitmaps to approximate the partitioning for the attribute that is needed in the next hash-based operation. Our performance experiments show that such QEPs based on early sorting, early partitioning, or both in combination perform significantly better than conventional strategies for many common classes of decision support queries.