Instruction fetching: coping with code bloat

Authors:
Richard Uhlig;David Nagle;Trevor Mudge;Stuart Sechrest;Joel Emer
Affiliations:
Gesellshaft für Mathematik und Datenverarbeitung (GMD), Schloβ Birlinghoven, 53757 Sankt Augustin, Germany;Department of ECE, Carnegie Mellon University, Pittsburgh, PA;EECS Department, University of Michigan, 1301 Beal Ave., Ann Arbor, Michigan;EECS Department, University of Michigan, 1301 Beal Ave., Ann Arbor, Michigan;Digital Equipment Corporation, 77 Reed Road HLO2-3/J3, Hudson, MA
Venue:
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Year:
1995

Citing 52
Cited 33

The X window system

ACM Transactions on Graphics (TOG)
Cache performance of operating system and multiprogramming workloads

ACM Transactions on Computer Systems (TOCS)
On the inclusion properties for multi-level cache hierarchies

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A simulation study of two-level caches

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Measuring VAX 8800 performance with a histogram hardware monitor

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Multiprocessor cache analysis using ATUM

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Program optimization for instruction caches

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Characteristics of performance-optimal multi-level cache hierarchies

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Organization and performance of a two-level virtual-real cache hierarchy

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Improving performance of small on-chip instruction caches

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
The effect of context switches on cache performance

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Implementing a cache for a high-performance GaAs microprocessor

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
An introduction to object-oriented programming

An introduction to object-oriented programming
VAX architecture reference manual (2nd ed.)

VAX architecture reference manual (2nd ed.)
Inside Windows NT

Inside Windows NT
Page placement algorithms for large real-indexed caches

ACM Transactions on Computer Systems (TOCS)
Performance optimization of pipelined primary cache

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Characterizing the caching and synchronization performance of a multiprocessor operating system

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Prefetching in supercomputer instruction caches

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
The dynamics of the computer industry: modeling the supply of workstations and their components

The dynamics of the computer industry: modeling the supply of workstations and their components
Design tradeoffs for software-managed TLBs

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Architectural support for translation table management in large address space machines

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The impact of operating system structure on memory system performance

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Tcl and the Tk toolkit

Tcl and the Tk toolkit
Shade: a fast instruction-set simulator for execution profiling

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Evaluating stream buffers as a secondary cache replacement

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tradeoffs in two-level on-chip caching

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Characterization of alpha AXP performance using TP and SPEC workloads

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Optimal allocation of on-chip memory for multiple-API operating systems

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Trap-driven simulation with Tapeworm II

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Contrasting characteristics and cache performance of technical and multi-user commercial workloads

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Avoiding conflict misses dynamically in large direct-mapped caches

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Performance of the VAX-11/780 translation buffer: simulation and measurement

ACM Transactions on Computer Systems (TOCS)
Cache behavior in the presence of speculative execution: the benefits of misprediction

Cache behavior in the presence of speculative execution: the benefits of misprediction
Trap-driven memory simulation

Trap-driven memory simulation
The performance impact of block sizes and fetch strategies

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The TLB slice—a low-cost high-speed address translation mechanism

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache evaluation and the impact of workload choice

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Cache Memories

ACM Computing Surveys (CSUR)
Cache Performance in the VAX-11/780

ACM Transactions on Computer Systems (TOCS)
Cache memory performance in a unix enviroment

ACM SIGARCH Computer Architecture News
Translation buffer performance in a UNIX enviroment

ACM SIGARCH Computer Architecture News
Cache Performance of the SPEC92 Benchmark Suite

IEEE Micro
A Model and Prototype of VMS Using the Mach 3.0 Kernel

Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures
The KeyKOS Nanokernel Architecture

Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures
Chorus

Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures
A Characterization of Processor Performance in the vax-11/780

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Optimizing instruction cache performance for operating system intensive workloads

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Page allocation to reduce access time of physical caches

Page allocation to reduce access time of physical caches

The measured performance of personal computer operating systems

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The role of adaptivity in two-level adaptive branch prediction

Proceedings of the 28th annual international symposium on Microarchitecture
The measured performance of personal computer operating systems

ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
An analysis of dynamic branch prediction schemes on system workloads

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Correlation and aliasing in dynamic branch predictors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Instruction prefetching of systems codes with layout optimized for reduced cache misses

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Analysis of branch prediction via data compression

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The structure and performance of interpreters

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Assigning confidence to conditional branch predictions

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Increasing the instruction fetch rate via block-structured instruction set architectures

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Trap-driven memory simulation with Tapeworm II

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Trace-driven memory simulation: a survey

ACM Computing Surveys (CSUR)
Trading conflict and capacity aliasing in conditional branch predictors

Proceedings of the 24th annual international symposium on Computer architecture
A language for describing predictors and its application to automatic synthesis

Proceedings of the 24th annual international symposium on Computer architecture
The bi-mode branch predictor

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Execution characteristics of desktop applications on Windows NT

Proceedings of the 25th annual international symposium on Computer architecture
A scalable front-end architecture for fast instruction delivery

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Comprehensive Hardware and Software Support for Operating Systems to Exploit MP Memory Hierarchies

IEEE Transactions on Computers
The impact of battery capacity and memory bandwidth on CPU speed-setting: a case study

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
The Exception Handling Effectiveness of POSIX Operating Systems

IEEE Transactions on Software Engineering
Optimizations Enabled by a Decoupled Front-End Architecture

IEEE Transactions on Computers
An Exploration of Instruction Fetch Requirement in Out-of-Order Superscalar Processors

International Journal of Parallel Programming
Trace-Driven Memory Simulation: A Survey

Performance Evaluation: Origins and Directions
Efficient Microprocessor Design Space Exploration through Statistical Simulation

ANSS '03 Proceedings of the 36th annual symposium on Simulation
PADded Cache: A New Fault-Tolerance Technique for Cache Memories

VTS '99 Proceedings of the 1999 17TH IEEE VLSI Test Symposium
A case study of a system-level approach to power-aware computing

ACM Transactions on Embedded Computing Systems (TECS)
How accurate should early design stage power/performance tools be? A case study with statistical simulation

Journal of Systems and Software - Special issue: Performance modeling and analysis of computer systems and networks
Alloyed branch history: combining global and local branch history for robust performance

International Journal of Parallel Programming
An efficient single-pass trace compression technique utilizing instruction streams

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Operating-system level tracing tools for the DEC AXP architecture

WCAE-3 '97 Proceedings of the 1997 workshop on Computer architecture education
Lazy cache invalidation for self-modifying codes

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
SHIFT: shared history instruction fetch for lean-core server processors

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.01

Visualization

Abstract

Previous research has shown that the SPEC benchmarks achieve low miss ratios in relatively small instruction caches. This paper presents evidence that current software-development practices produce applications that exhibit substantially higher instruction-cache miss ratios than do the SPEC benchmarks. To represent these trends, we have assembled a collection of applications, called the Instruction Benchmark Suite (IBS), that provides a better test of instruction-cache performance. We discuss the rationale behind the design of IBS and characterize its behavior relative to the SPEC benchmark suite. Our analysis is based on trace-driven and trap-driven simulations and takes into full account both the application and operating-system components of the workloads.This paper then reexamines a collection of previously-proposed hardware mechanisms for improving instruction-fetch performance in the context of the IBS workloads. We study the impact of cache organization, transfer bandwidth, prefetching, and pipelined memory systems on machines that rely on the use of relatively small primary instruction caches to facilitate increased clock rates. We find that, although of little use for SPEC, the right combination of these techniques substantially benefits IBS. Even so, under IBS, a stubborn lower bound on the instruction-fetch CPI remains as an obstacle to improving overall processor performance.