Sort vs. Hash Revisited

Authors:
G. Graefe;A. Linville;L. D. Shapiro
Affiliations:
-;-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
1994

Citing 34
Cited 20

Design and implementation of the Wisconsin storage system

Software—Practice & Experience
Join processing in database systems with large main memories

ACM Transactions on Database Systems (TODS)
Programming constructs for database system implementation in EXODUS

SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
File structures: an analytic approach

File structures: an analytic approach
The input/output complexity of sorting and related problems

Communications of the ACM
Sorting Large Files on a Backend Multiprocessor

IEEE Transactions on Computers
A performance analysis of the gamma database machine

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Merging sorted runs using large main memory

Acta Informatica
A low communication sort algorithm for a parallel database machine

VLDB '89 Proceedings of the 15th international conference on Very large data bases
The effect of bucket size tuning in the dynamic hybrid GRACE hash join method

VLDB '89 Proceedings of the 15th international conference on Very large data bases
An adaptive hash join algorithm for multiuser environments

Proceedings of the sixteenth international conference on Very large databases
FastSort: a distributed single-input single-output external sort

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Encapsulation of parallelism in the Volcano query processing system

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Access support in object bases

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Tuning a parallel database algorithm on a shared-memory multiprocessor

Software—Practice & Experience
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
Duplicate record elimination in large data files

ACM Transactions on Database Systems (TODS)
Extendible hashing—a fast access method for dynamic files

ACM Transactions on Database Systems (TODS)
System R: relational approach to database management

ACM Transactions on Database Systems (TODS)
The design and implementation of INGRES

ACM Transactions on Database Systems (TODS)
Parallel sorting on a shared-nothing architecture using probabilistic splitting

PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Access paths in the "Abe" statistical query facility

SIGMOD '82 Proceedings of the 1982 ACM SIGMOD international conference on Management of data
Implementation techniques for main memory database systems

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Volcano— An Extensible and Parallel Query Evaluation System

IEEE Transactions on Knowledge and Data Engineering
Encapsulation of Parallelism and Architecture-Independence in Extensible Database Query Execution

IEEE Transactions on Software Engineering
System Issues in Parallel Sorting for Database Systems

Proceedings of the Sixth International Conference on Data Engineering
Hashing Methods and Relational Algebra Operations

VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
An Overview of The System Software of A Parallel Relational Database Machine GRACE

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
GAMMA - A High Performance Dataflow Database Machine

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
A Study of Sort Algorithms for Multiprocessor Database Machines

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Hash-Partitioned Join Method Using Dynamic Destaging Strategy

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
The optimization of queries in relational databases

The optimization of queries in relational databases

Fast algorithms for universal quantification in large databases

ACM Transactions on Database Systems (TODS)
Reusing invariants: a new strategy for correlated queries

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Skew handling techniques in sort-merge join

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Diag-Join: An Opportunistic Join Algorithm for 1:N Relationships

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Memory-Adaptive External Sorting

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Memory-Contention Responsive Hash Joins

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Coalescing in Temporal Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Evaluation of Main Memory Join Algorithms for Joins with Set Comparison Join Predicates

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
OLAP Query Processing Algorithm Based on Relational Storage

WAIM '02 Proceedings of the Third International Conference on Advances in Web-Age Information Management
Fast joins using join indices

The VLDB Journal — The International Journal on Very Large Data Bases
Sing the truth about ad hoc join costs

The VLDB Journal — The International Journal on Very Large Data Bases
Join operations in temporal databases

The VLDB Journal — The International Journal on Very Large Data Bases
A case for flash memory ssd in enterprise database applications

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs

Proceedings of the VLDB Endowment
Designing fast architecture-sensitive tree search on modern multicore/many-core processors

ACM Transactions on Database Systems (TODS)
New algorithms for join and grouping operations

Computer Science - Research and Development
Main memory implementations for binary grouping

XSym'05 Proceedings of the Third international conference on Database and XML Technologies
Advanced partitioning techniques for massively distributed computation

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
hStorage-DB: heterogeneity-aware data management to exploit the full capability of hybrid storage systems

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficient algorithms for processing large volumes of data are very important both for relational and new object-oriented database systems. Many query-processing operations can be implemented using sort- or hash-based algorithms, e.g. intersections, joins, and duplicate elimination. In the early relational database systems, only sort-based algorithms were employed. In the last decade, hash-based algorithms have gained acceptance and popularity, and are often considered generally superior to sort-based algorithms such as merge-join. In this article, we compare the concepts behind sort- and hash-based query-processing algorithms and conclude that (1) many dualities exist between the two types of algorithms, (2) their costs differ mostly by percentages rather than by factors, (3) several special cases exist that favor one or the other choice, and (4) there is a strong reason why both hash- and sort-based algorithms should be available in a query-processing system. Our conclusions are supported by experiments performed using the Volcano query execution engine.