Redesigning database systems in light of cpu cache prefetching

Authors:
Anastassia Ailamaki;Todd C. Mowry;Shimin Chen
Affiliations:
Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University
Venue:
Redesigning database systems in light of cpu cache prefetching
Year:
2005

Citing 0
Cited 1

Improving hash join performance through prefetching

ACM Transactions on Database Systems (TODS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computer systems have enjoyed an exponential growth in processor speed for the past 20 years, while main memory speed has improved only moderately. Today a cache miss to main memory takes hundreds of processor cycles. Recent studies have demonstrated that on commercial databases, about 50% or more of execution time in memory is often wasted due to cache misses. In light of this problem, a number of recent studies focused on reducing the number of cache misses of database algorithms. In this thesis, we investigate a different approach: reducing the impact of cache misses through a technique called cache prefetching. Since prefetching for sequential array accesses has been well studied, we are interested in studying non-contiguous access patterns found in two classes of database algorithms: the B+-Tree index algorithm and the hash join algorithm. We re-examine their designs with cache prefetching in mind, and combine prefetching and data locality optimizations to achieve good cache performance. For B+-Trees, we first propose and evaluate a novel main memory index structure, Prefetching B+Trees, which uses prefetching to accelerate two major access patterns of B+-Tree indices: searches and range scans. We then apply our findings in the development of a novel index structure, Fractal Prefetching B+-Trees, that optimizes index operations both for CPU cache performance and for disk performance in commercial database systems by intelligently embedding cache-optimized trees into disk pages. For hash joins, we first exploit cache prefetching separately for the I/O partition phase and the join phase of the algorithm. We propose and evaluate two techniques, Group Prefetching and Software-Pipelined Prefetching, that exploit inter-tuple parallelism to overlap cache misses across the processing of multiple tuples. Then we present a novel algorithm, Inspector Joins, that exploits the free information obtained from one pass of the hash join algorithm to improve the performance of a later pass. This new algorithm addresses the memory bandwidth sharing problem in shared-bus multiprocessor systems. We compare our techniques against state-of-the-art cache-friendly algorithms for B+-Trees and hash joins through both simulation studies and real machine experiments. Our experimental results demonstrate dramatic performance benefits of our cache prefetching enabled techniques.