Instruction fetching: coping with code bloat

  • Authors:
  • Richard Uhlig;David Nagle;Trevor Mudge;Stuart Sechrest;Joel Emer

  • Affiliations:
  • Gesellshaft für Mathematik und Datenverarbeitung (GMD), Schloβ Birlinghoven, 53757 Sankt Augustin, Germany;Department of ECE, Carnegie Mellon University, Pittsburgh, PA;EECS Department, University of Michigan, 1301 Beal Ave., Ann Arbor, Michigan;EECS Department, University of Michigan, 1301 Beal Ave., Ann Arbor, Michigan;Digital Equipment Corporation, 77 Reed Road HLO2-3/J3, Hudson, MA

  • Venue:
  • ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
  • Year:
  • 1995

Quantified Score

Hi-index 0.01

Visualization

Abstract

Previous research has shown that the SPEC benchmarks achieve low miss ratios in relatively small instruction caches. This paper presents evidence that current software-development practices produce applications that exhibit substantially higher instruction-cache miss ratios than do the SPEC benchmarks. To represent these trends, we have assembled a collection of applications, called the Instruction Benchmark Suite (IBS), that provides a better test of instruction-cache performance. We discuss the rationale behind the design of IBS and characterize its behavior relative to the SPEC benchmark suite. Our analysis is based on trace-driven and trap-driven simulations and takes into full account both the application and operating-system components of the workloads.This paper then reexamines a collection of previously-proposed hardware mechanisms for improving instruction-fetch performance in the context of the IBS workloads. We study the impact of cache organization, transfer bandwidth, prefetching, and pipelined memory systems on machines that rely on the use of relatively small primary instruction caches to facilitate increased clock rates. We find that, although of little use for SPEC, the right combination of these techniques substantially benefits IBS. Even so, under IBS, a stubborn lower bound on the instruction-fetch CPI remains as an obstacle to improving overall processor performance.