Massively parallel sort-merge joins in main memory multi-core database systems

Authors:
Martina-Cezara Albutiu;Alfons Kemper;Thomas Neumann
Affiliations:
Technische Universität München, Garching, Germany;Technische Universität München, Garching, Germany;Technische Universität München, Garching, Germany
Venue:
Proceedings of the VLDB Endowment
Year:
2012

Citing 21
Cited 5

Introspective sorting and selection algorithms

Software—Practice & Experience
The Gamma Database Machine Project

IEEE Transactions on Knowledge and Data Engineering
Optimizing Main-Memory Join on Modern Hardware

IEEE Transactions on Knowledge and Data Engineering
Sort-Merge-Join: An Idea Whose Time Has(h) Passed?

Proceedings of the Tenth International Conference on Data Engineering
Database Architecture Optimized for the New Bottleneck: Memory Access

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Practical Skew Handling in Parallel Joins

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Exploiting early sorting and early partitioning for decision support query processing

The VLDB Journal — The International Journal on Very Large Data Bases
Improving hash join performance through prefetching

ACM Transactions on Database Systems (TODS)
Relational joins on graphics processors

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient implementation of sorting on multi-core SIMD CPU architecture

Proceedings of the VLDB Endowment
Optimal splitters for database partitioning with size bounds

Proceedings of the 12th International Conference on Database Theory
Spinning relations: high-speed networks for distributed join processing

Proceedings of the Fifth International Workshop on Data Management on New Hardware
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs

Proceedings of the VLDB Endowment
Database architecture evolution: mammals flourished long before dinosaurs became extinct

Proceedings of the VLDB Endowment
Design and evaluation of main memory hash join algorithms for multi-core CPUs

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
How soccer players would do stream joins

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficiently compiling efficient query plans for modern hardware

Proceedings of the VLDB Endowment
HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Integration of vectorwise with ingres

ACM SIGMOD Record
SAP HANA database: data management for modern business applications

ACM SIGMOD Record
New algorithms for join and grouping operations

Computer Science - Research and Development

Memory footprint matters: efficient equi-join algorithms for main memory data processing

Proceedings of the 4th annual Symposium on Cloud Computing
A parallel spatial data analysis infrastructure for the cloud

Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
DB2 with BLU acceleration: so much more than just a column store

Proceedings of the VLDB Endowment
Design and evaluation of storage organizations for read-optimized main memory databases

Proceedings of the VLDB Endowment
Eliminating unscalable communication in transaction processing

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Two emerging hardware trends will dominate the database system technology in the near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing. Many algorithmic and control techniques in current database technology were devised for disk-based systems where I/O dominated the performance. In this work we take a new look at the well-known sort-merge join which, so far, has not been in the focus of research in scalable massively parallel multi-core data processing as it was deemed inferior to hash joins. We devise a suite of new massively parallel sort-merge (MPSM) join algorithms that are based on partial partition-based sorting. Contrary to classical sort-merge joins, our MPSM algorithms do not rely on a hard to parallelize final merge step to create one complete sort order. Rather they work on the independently created runs in parallel. This way our MPSM algorithms are NUMA-affine as all the sorting is carried out on local memory partitions. An extensive experimental evaluation on a modern 32-core machine with one TB of main memory proves the competitive performance of MPSM on large main memory databases with billions of objects. It scales (almost) linearly in the number of employed cores and clearly outperforms competing hash join proposals -- in particular it outperforms the "cutting-edge" Vectorwise parallel query engine by a factor of four.