Design and evaluation of main memory hash join algorithms for multi-core CPUs

Authors:
Spyros Blanas;Yinan Li;Jignesh M. Patel
Affiliations:
University of Wisconsin-Madison, Madison, WI, USA;University of Wisconsin-Madison, Madison, WI, USA;University of Wisconsin-Madison, Madison, WI, USA
Venue:
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Year:
2011

Citing 15
Cited 15

Parallel database systems: the future of database processing or a passing fad?

ACM SIGMOD Record - Directions for future database research & development
Implementation techniques for main memory database systems

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Optimizing Main-Memory Join on Modern Hardware

IEEE Transactions on Knowledge and Data Engineering
Hash Joins and Hash Teams in Microsoft SQL Server

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Database Architecture Optimized for the New Bottleneck: Memory Access

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
DBMSs on a Modern Processor: Where Does Time Go?

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
What Happens During a Join? Dissecting CPU and Memory Optimization Effects

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Practical Skew Handling in Parallel Joins

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Cache Conscious Algorithms for Relational Query Processing

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Cuckoo hashing

Journal of Algorithms
Database hash-join algorithms on multithreaded computer architectures

Proceedings of the 3rd conference on Computing frontiers
Parallel buffers for chip multiprocessors

DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
Data partitioning on chip multiprocessors

Proceedings of the 4th international workshop on Data management on new hardware
Cache-conscious buffering for database operators with state

Proceedings of the Fifth International Workshop on Data Management on New Hardware
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs

Proceedings of the VLDB Endowment

Reducing cache misses in hash join probing phase by pre-sorting strategy (abstract only)

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
GPU join processing revisited

DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Massively parallel sort-merge joins in main memory multi-core database systems

Proceedings of the VLDB Endowment
Efficient frequent item counting in multi-core hardware

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
OLTP on hardware islands

Proceedings of the VLDB Endowment
Navigating big data with high-throughput, energy-efficient data partitioning

Proceedings of the 40th Annual International Symposium on Computer Architecture
Toward intersection filter-based optimization for joins in MapReduce

Proceedings of the 2nd International Workshop on Cloud Intelligence
Memory footprint matters: efficient equi-join algorithms for main memory data processing

Proceedings of the 4th annual Symposium on Cloud Computing
MacroDB: scaling database engines on multicores

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
The Yin and Yang of processing data warehousing queries on GPU devices

Proceedings of the VLDB Endowment
Revisiting co-processing for hash joins on the coupled CPU-GPU architecture

Proceedings of the VLDB Endowment
Design and evaluation of storage organizations for read-optimized main memory databases

Proceedings of the VLDB Endowment
OmniDB: towards portable and efficient query processing on parallel CPU/GPU architectures

Proceedings of the VLDB Endowment
Meet the walkers: accelerating index traversals for in-memory databases

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Skew strikes back: new developments in the theory of join algorithms

ACM SIGMOD Record

Quantified Score

Hi-index	0.00

Visualization

Abstract

The focus of this paper is on investigating efficient hash join algorithms for modern multi-core processors in main memory environments. This paper dissects each internal phase of a typical hash join algorithm and considers different alternatives for implementing each phase, producing a family of hash join algorithms. Then, we implement these main memory algorithms on two radically different modern multi-processor systems, and carefully examine the factors that impact the performance of each method. Our analysis reveals some interesting results -- a very simple hash join algorithm is very competitive to the other more complex methods. This simple join algorithm builds a shared hash table and does not partition the input relations. Its simplicity implies that it requires fewer parameter settings, thereby making it far easier for query optimizers and execution engines to use it in practice. Furthermore, the performance of this simple algorithm improves dramatically as the skew in the input data increases, and it quickly starts to outperform all other algorithms.