Join processing in database systems with large main memories
ACM Transactions on Database Systems (TODS)
An adaptive hash join algorithm for multiuser environments
Proceedings of the sixteenth international conference on Very large databases
Loop distribution with arbitrary control flow
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Tolerating latency through software-controlled data prefetching
Tolerating latency through software-controlled data prefetching
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Automatic Compiler-Inserted Prefetching for Pointer-Based Applications
IEEE Transactions on Computers - Special issue on cache memory and related problems
Proceedings of the 27th annual international symposium on Computer architecture
Improving index performance through prefetching
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Fractal prefetching B+-Trees: optimizing both cache and disk performance
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Database Architecture Optimized for the New Bottleneck: Memory Access
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Hash-Partitioned Join Method Using Dynamic Destaging Strategy
VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
What Happens During a Join? Dissecting CPU and Memory Optimization Effects
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Cache Conscious Algorithms for Relational Query Processing
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A systolic array optimizing compiler
A systolic array optimizing compiler
Improving Hash Join Performance through Prefetching
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Improving database performance on simultaneous multithreading processors
VLDB '05 Proceedings of the 31st international conference on Very large data bases
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Accelerating database operators using a network processor
DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Redesigning database systems in light of cpu cache prefetching
Redesigning database systems in light of cpu cache prefetching
Breaking the memory wall in MonetDB
Communications of the ACM - Surviving the data deluge
DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing
Proceedings of the 4th international workshop on Data management on new hardware
Database architecture evolution: mammals flourished long before dinosaurs became extinct
Proceedings of the VLDB Endowment
MCC-DB: minimizing cache conflicts in multi-core processors for databases
Proceedings of the VLDB Endowment
The DataPath system: a data-centric analytic processing engine for large data warehouses
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Performance improvement of join queries through algebraic signatures
International Journal of Intelligent Information and Database Systems
Data structures for the most frequently used algorithm
Journal of Computing Sciences in Colleges
When Prefetching Works, When It Doesn’t, and Why
ACM Transactions on Architecture and Code Optimization (TACO)
Massively parallel sort-merge joins in main memory multi-core database systems
Proceedings of the VLDB Endowment
Micro adaptivity in Vectorwise
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Memory footprint matters: efficient equi-join algorithms for main memory data processing
Proceedings of the 4th annual Symposium on Cloud Computing
Revisiting co-processing for hash joins on the coupled CPU-GPU architecture
Proceedings of the VLDB Endowment
OmniDB: towards portable and efficient query processing on parallel CPU/GPU architectures
Proceedings of the VLDB Endowment
Meet the walkers: accelerating index traversals for in-memory databases
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Eliminating unscalable communication in transaction processing
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
Hash join algorithms suffer from extensive CPU cache stalls. This article shows that the standard hash join algorithm for disk-oriented databases (i.e. GRACE) spends over 80% of its user time stalled on CPU cache misses, and explores the use of CPU cache prefetching to improve its cache performance. Applying prefetching to hash joins is complicated by the data dependencies, multiple code paths, and inherent randomness of hashing. We present two techniques, group prefetching and software-pipelined prefetching, that overcome these complications. These schemes achieve 1.29--4.04X speedups for the join phase and 1.37--3.49X speedups for the partition phase over GRACE and simple prefetching approaches. Moreover, compared with previous cache-aware approaches (i.e. cache partitioning), the schemes are at least 36% faster on large relations and do not require exclusive use of the CPU cache to be effective. Finally, comparing the elapsed real times when disk I/Os are in the picture, our cache prefetching schemes achieve 1.12--1.84X speedups for the join phase and 1.06--1.60X speedups for the partition phase over the GRACE hash join algorithm.