Call graph prefetching for database applications

  • Authors:
  • Murali Annavaram;Jignesh M. Patel;Edward S. Davidson

  • Affiliations:
  • Intel Corporation, Santa Clara, CA;The University of Michigan, Ann Arbor, MI;The University of Michigan, Ann Arbor, MI

  • Venue:
  • ACM Transactions on Computer Systems (TOCS)
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the continuing technological trend of ever cheaper and larger memory, most data sets in database servers will soon be able to reside in main memory. In this configuration, the performance bottleneck is likely to be the gap between the processing speed of the CPU and the memory access latency. Previous work has shown that database applications have large instruction and data footprints and hence do not use processor caches effectively. In this paper, we propose Call Graph Prefetching (CGP), an N instruction prefetching technique that analyzes the call graph of a database system and prefetches instructions from the function that is deemed likely to be called next. CGP capitalizes on the highly predictable function call sequences that are typical of database systems. CGP can be implemented either in software or in hardware. The software-based CGP (CGP_S) uses profile information to build a call graph, and uses the predictable call sequences in the call graph to determine which function to prefetch next. The hardware-based CGP(CGP_H) uses a hardware table, called the Call Graph History Cache (CGHC), to dynamically store sequences of functions invoked during program execution, and uses that stored history when choosing which functions to prefetch.We evaluate the performance of CGP on sets of Wisconsin and TPC-H queries, as well as on CPU-2000 benchmarks. For most CPU-2000 applications the number of instruction cache (I-cache) misses were very few even without any prefetching, obviating the need for CGP. On the other hand, the database workloads do suffer a significant number of I-cache misses; CGP_S improves their performance by 23% and CGP_H by 26% over a baseline system that has already been highly tuned for efficient I-cache usage by using the OM tool. CGP, with or without OM, reduces the I-cache miss stall time by about 50% relative to O5+OM, taking us about half way from an already highly tuned baseline system toward perfect I-cache performance.