Effective Instruction Prefetching in Chip Multiprocessors for Modern Commercial Applications

Authors:
Lawrence Spracklen;Yuan Chou;Santosh G. Abraham
Affiliations:
Sun Microsystems Inc., Sunnyvale, CA;Sun Microsystems Inc., Sunnyvale, CA;Sun Microsystems Inc., Sunnyvale, CA
Venue:
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Year:
2005

Citing 0
Cited 15

Montecito: A Dual-Core, Dual-Thread Itanium Processor

IEEE Micro
Store Memory-Level Parallelism Optimizations for Commercial Applications

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Characteristics of workloads used in high performance and technical computing

Proceedings of the 21st annual international conference on Supercomputing
Adaptive set pinning: managing shared caches in chip multiprocessors

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
A novel migration-based NUCA design for chip multiprocessors

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A compiler-directed data prefetching scheme for chip multiprocessors

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Temporal instruction fetch streaming

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Adaptive prefetching for shared cache based chip multiprocessors

Proceedings of the Conference on Design, Automation and Test in Europe
Coterminous locality and coterminous group data prefetching on chip-multiprocessors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Proactive instruction fetch

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Cost Minimization with HPDFG and Data Mining for Heterogeneous DSP

Journal of Signal Processing Systems
RDIP: return-address-stack directed instruction prefetching

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
SHIFT: shared history instruction fetch for lean-core server processors

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we study the instruction cache miss behavior of four modern commercial applications (a database workload, TPC-W, SPECjAppServer2002 and SPECweb99). These applications exhibit high instruction cache miss rates for both the L1 and L2 caches, and a sizable performance improvement can be achieved by eliminating these misses. We show that it is important, not only to address sequential misses, but also misses due to branches and function calls. As a result, we propose an efficient discontinuity prefetching scheme that can be effectively combined with traditional sequential prefetching to address all forms of instruction cache misses. Additionally, with the emergence of chip multiprocessors (CMPs), instruction prefetching schemes must take into account their effect on the shared L2 cache. Specifically, aggressive instruction cache prefetching can result in an increase in the number of L2 cache data misses. As a solution, we propose a scheme that does not install prefetches into the L2 cache unless they are proven to be useful. Overall, we demonstrate that the combination of our proposed schemes is successful in reducing the instruction miss rate to only 10%-16% of the original miss rate and results in a 1.08X-1.37X performance improvement for the applications studied.