Improving hash join performance through prefetching

Authors:
Shimin Chen;Anastassia Ailamaki;Phillip B. Gibbons;Todd C. Mowry
Affiliations:
Intel Research Pittsburgh, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Intel Research Pittsburgh, Pittsburgh, PA;Carnegie Mellon University and Intel Research Pittsburgh, Pittsburgh, PA
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2007

Citing 24
Cited 15

Join processing in database systems with large main memories

ACM Transactions on Database Systems (TODS)
An adaptive hash join algorithm for multiuser environments

Proceedings of the sixteenth international conference on Very large databases
Loop distribution with arbitrary control flow

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Tolerating latency through software-controlled data prefetching

Tolerating latency through software-controlled data prefetching
Compiler-based prefetching for recursive data structures

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Automatic Compiler-Inserted Prefetching for Pointer-Based Applications

IEEE Transactions on Computers - Special issue on cache memory and related problems
Recency-based TLB preloading

Proceedings of the 27th annual international symposium on Computer architecture
Improving index performance through prefetching

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Fractal prefetching B+-Trees: optimizing both cache and disk performance

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Database Architecture Optimized for the New Bottleneck: Memory Access

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Hash-Partitioned Join Method Using Dynamic Destaging Strategy

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
What Happens During a Join? Dissecting CPU and Memory Optimization Effects

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Cache Conscious Algorithms for Relational Query Processing

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A systolic array optimizing compiler

A systolic array optimizing compiler
Improving Hash Join Performance through Prefetching

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Improving database performance on simultaneous multithreading processors

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Inspector joins

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Accelerating database operators using a network processor

DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
Redesigning database systems in light of cpu cache prefetching

Redesigning database systems in light of cpu cache prefetching
IBM Power5 Chip: A Dual-Core Multithreaded Processor

IEEE Micro

Breaking the memory wall in MonetDB

Communications of the ACM - Surviving the data deluge
DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing

Proceedings of the 4th international workshop on Data management on new hardware
Database architecture evolution: mammals flourished long before dinosaurs became extinct

Proceedings of the VLDB Endowment
MCC-DB: minimizing cache conflicts in multi-core processors for databases

Proceedings of the VLDB Endowment
The DataPath system: a data-centric analytic processing engine for large data warehouses

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Performance improvement of join queries through algebraic signatures

International Journal of Intelligent Information and Database Systems
Data structures for the most frequently used algorithm

Journal of Computing Sciences in Colleges
When Prefetching Works, When It Doesn’t, and Why

ACM Transactions on Architecture and Code Optimization (TACO)
Massively parallel sort-merge joins in main memory multi-core database systems

Proceedings of the VLDB Endowment
Micro adaptivity in Vectorwise

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Memory footprint matters: efficient equi-join algorithms for main memory data processing

Proceedings of the 4th annual Symposium on Cloud Computing
Revisiting co-processing for hash joins on the coupled CPU-GPU architecture

Proceedings of the VLDB Endowment
OmniDB: towards portable and efficient query processing on parallel CPU/GPU architectures

Proceedings of the VLDB Endowment
Meet the walkers: accelerating index traversals for in-memory databases

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Eliminating unscalable communication in transaction processing

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hash join algorithms suffer from extensive CPU cache stalls. This article shows that the standard hash join algorithm for disk-oriented databases (i.e. GRACE) spends over 80% of its user time stalled on CPU cache misses, and explores the use of CPU cache prefetching to improve its cache performance. Applying prefetching to hash joins is complicated by the data dependencies, multiple code paths, and inherent randomness of hashing. We present two techniques, group prefetching and software-pipelined prefetching, that overcome these complications. These schemes achieve 1.29--4.04X speedups for the join phase and 1.37--3.49X speedups for the partition phase over GRACE and simple prefetching approaches. Moreover, compared with previous cache-aware approaches (i.e. cache partitioning), the schemes are at least 36% faster on large relations and do not require exclusive use of the CPU cache to be effective. Finally, comparing the elapsed real times when disk I/Os are in the picture, our cache prefetching schemes achieve 1.12--1.84X speedups for the join phase and 1.06--1.60X speedups for the partition phase over the GRACE hash join algorithm.