Call graph prefetching for database applications

Authors:
Murali Annavaram;Jignesh M. Patel;Edward S. Davidson
Affiliations:
Intel Corporation, Santa Clara, CA;The University of Michigan, Ann Arbor, MI;The University of Michigan, Ann Arbor, MI
Venue:
ACM Transactions on Computer Systems (TOCS)
Year:
2003

Citing 30
Cited 7

Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
AlphaSort: a RISC machine sort

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Shoring up persistent applications

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Characterization of alpha AXP performance using TP and SPEC workloads

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Commercial workload performance in the IBM POWER2 RISC System/6000 processor

IBM Journal of Research and Development
Contrasting characteristics and cache performance of technical and multi-user commercial workloads

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Evaluation of multithreaded uniprocessors for commercial application environments

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Wrong-path instruction prefetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Efficient procedure mapping using cache line coloring

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Prefetching using Markov predictors

Proceedings of the 24th annual international symposium on Computer architecture
Procedure placement using temporal ordering information

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
An analysis of database workload performance on simultaneous multithreaded processors

Proceedings of the 25th annual international symposium on Computer architecture
A Performance Study of Instruction Cache Prefetching Methods

IEEE Transactions on Computers
The Asilomar report on database research

ACM SIGMOD Record
Fetch directed instruction prefetching

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Architectural and compiler support for effective instruction prefetching: a cooperative approach

ACM Transactions on Computer Systems (TOCS)
Data prefetching by dependence graph precomputation

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
The Drill Down Benchmark

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
DBMSs on a Modern Processor: Where Does Time Go?

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Benchmarking Database Systems A Systematic Approach

VLDB '83 Proceedings of the 9th International Conference on Very Large Data Bases
Cache Conscious Algorithms for Relational Query Processing

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
The Memory Performance of DSS Commercial Workloads in Shared-Memory Multiprocessors

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Temporal-Based Procedure Reordering for Improved Instruction Cache Performance

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Instruction prefetching using branch prediction information

ICCD '97 Proceedings of the 1997 International Conference on Computer Design (ICCD '97)
Call Graph Prefetching for Database Applications

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Branch History Guided Instruction Prefetching

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Prefetch mechanisms that acquire and exploit application specific knowledge

Prefetch mechanisms that acquire and exploit application specific knowledge
A Prefetch Taxonomy

IEEE Transactions on Computers
Instrumentation and optimization of Win32/intel executables using Etch

NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997

Improving instruction cache performance in OLTP

ACM Transactions on Database Systems (TODS)
Steps towards cache-resident transaction processing

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Temporal instruction fetch streaming

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Global-aware and multi-order context-based prefetching for high-performance processors

International Journal of High Performance Computing Applications
Proactive instruction fetch

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
RDIP: return-address-stack directed instruction prefetching

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
SHIFT: shared history instruction fetch for lean-core server processors

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the continuing technological trend of ever cheaper and larger memory, most data sets in database servers will soon be able to reside in main memory. In this configuration, the performance bottleneck is likely to be the gap between the processing speed of the CPU and the memory access latency. Previous work has shown that database applications have large instruction and data footprints and hence do not use processor caches effectively. In this paper, we propose Call Graph Prefetching (CGP), an N instruction prefetching technique that analyzes the call graph of a database system and prefetches instructions from the function that is deemed likely to be called next. CGP capitalizes on the highly predictable function call sequences that are typical of database systems. CGP can be implemented either in software or in hardware. The software-based CGP (CGP_S) uses profile information to build a call graph, and uses the predictable call sequences in the call graph to determine which function to prefetch next. The hardware-based CGP(CGP_H) uses a hardware table, called the Call Graph History Cache (CGHC), to dynamically store sequences of functions invoked during program execution, and uses that stored history when choosing which functions to prefetch.We evaluate the performance of CGP on sets of Wisconsin and TPC-H queries, as well as on CPU-2000 benchmarks. For most CPU-2000 applications the number of instruction cache (I-cache) misses were very few even without any prefetching, obviating the need for CGP. On the other hand, the database workloads do suffer a significant number of I-cache misses; CGP_S improves their performance by 23% and CGP_H by 26% over a baseline system that has already been highly tuned for efficient I-cache usage by using the OM tool. CGP, with or without OM, reduces the I-cache miss stall time by about 50% relative to O5+OM, taking us about half way from an already highly tuned baseline system toward perfect I-cache performance.