ACM Transactions on Graphics (TOG)
Cache performance of operating system and multiprogramming workloads
ACM Transactions on Computer Systems (TOCS)
On the inclusion properties for multi-level cache hierarchies
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A simulation study of two-level caches
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Measuring VAX 8800 performance with a histogram hardware monitor
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Multiprocessor cache analysis using ATUM
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Characteristics of performance-optimal multi-level cache hierarchies
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Organization and performance of a two-level virtual-real cache hierarchy
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Improving performance of small on-chip instruction caches
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
The effect of context switches on cache performance
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Implementing a cache for a high-performance GaAs microprocessor
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
An introduction to object-oriented programming
An introduction to object-oriented programming
VAX architecture reference manual (2nd ed.)
VAX architecture reference manual (2nd ed.)
Inside Windows NT
Page placement algorithms for large real-indexed caches
ACM Transactions on Computer Systems (TOCS)
Performance optimization of pipelined primary cache
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Characterizing the caching and synchronization performance of a multiprocessor operating system
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Prefetching in supercomputer instruction caches
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
The dynamics of the computer industry: modeling the supply of workstations and their components
The dynamics of the computer industry: modeling the supply of workstations and their components
Design tradeoffs for software-managed TLBs
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Architectural support for translation table management in large address space machines
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The impact of operating system structure on memory system performance
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Tcl and the Tk toolkit
Shade: a fast instruction-set simulator for execution profiling
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tradeoffs in two-level on-chip caching
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Characterization of alpha AXP performance using TP and SPEC workloads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Optimal allocation of on-chip memory for multiple-API operating systems
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Trap-driven simulation with Tapeworm II
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Contrasting characteristics and cache performance of technical and multi-user commercial workloads
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Avoiding conflict misses dynamically in large direct-mapped caches
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Performance of the VAX-11/780 translation buffer: simulation and measurement
ACM Transactions on Computer Systems (TOCS)
Cache behavior in the presence of speculative execution: the benefits of misprediction
Cache behavior in the presence of speculative execution: the benefits of misprediction
Trap-driven memory simulation
The performance impact of block sizes and fetch strategies
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The TLB slice—a low-cost high-speed address translation mechanism
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache evaluation and the impact of workload choice
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Cache Performance in the VAX-11/780
ACM Transactions on Computer Systems (TOCS)
Cache memory performance in a unix enviroment
ACM SIGARCH Computer Architecture News
Translation buffer performance in a UNIX enviroment
ACM SIGARCH Computer Architecture News
A Model and Prototype of VMS Using the Mach 3.0 Kernel
Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures
The KeyKOS Nanokernel Architecture
Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures
Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures
A Characterization of Processor Performance in the vax-11/780
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Optimizing instruction cache performance for operating system intensive workloads
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Page allocation to reduce access time of physical caches
Page allocation to reduce access time of physical caches
The measured performance of personal computer operating systems
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The role of adaptivity in two-level adaptive branch prediction
Proceedings of the 28th annual international symposium on Microarchitecture
The measured performance of personal computer operating systems
ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
An analysis of dynamic branch prediction schemes on system workloads
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Correlation and aliasing in dynamic branch predictors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Instruction prefetching of systems codes with layout optimized for reduced cache misses
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Analysis of branch prediction via data compression
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The structure and performance of interpreters
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Assigning confidence to conditional branch predictions
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Increasing the instruction fetch rate via block-structured instruction set architectures
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Trap-driven memory simulation with Tapeworm II
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
Trading conflict and capacity aliasing in conditional branch predictors
Proceedings of the 24th annual international symposium on Computer architecture
A language for describing predictors and its application to automatic synthesis
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Execution characteristics of desktop applications on Windows NT
Proceedings of the 25th annual international symposium on Computer architecture
A scalable front-end architecture for fast instruction delivery
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Comprehensive Hardware and Software Support for Operating Systems to Exploit MP Memory Hierarchies
IEEE Transactions on Computers
The impact of battery capacity and memory bandwidth on CPU speed-setting: a case study
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
The Exception Handling Effectiveness of POSIX Operating Systems
IEEE Transactions on Software Engineering
Optimizations Enabled by a Decoupled Front-End Architecture
IEEE Transactions on Computers
An Exploration of Instruction Fetch Requirement in Out-of-Order Superscalar Processors
International Journal of Parallel Programming
Trace-Driven Memory Simulation: A Survey
Performance Evaluation: Origins and Directions
Efficient Microprocessor Design Space Exploration through Statistical Simulation
ANSS '03 Proceedings of the 36th annual symposium on Simulation
PADded Cache: A New Fault-Tolerance Technique for Cache Memories
VTS '99 Proceedings of the 1999 17TH IEEE VLSI Test Symposium
A case study of a system-level approach to power-aware computing
ACM Transactions on Embedded Computing Systems (TECS)
Journal of Systems and Software - Special issue: Performance modeling and analysis of computer systems and networks
Alloyed branch history: combining global and local branch history for robust performance
International Journal of Parallel Programming
An efficient single-pass trace compression technique utilizing instruction streams
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Operating-system level tracing tools for the DEC AXP architecture
WCAE-3 '97 Proceedings of the 1997 workshop on Computer architecture education
Lazy cache invalidation for self-modifying codes
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
SHIFT: shared history instruction fetch for lean-core server processors
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.01 |
Previous research has shown that the SPEC benchmarks achieve low miss ratios in relatively small instruction caches. This paper presents evidence that current software-development practices produce applications that exhibit substantially higher instruction-cache miss ratios than do the SPEC benchmarks. To represent these trends, we have assembled a collection of applications, called the Instruction Benchmark Suite (IBS), that provides a better test of instruction-cache performance. We discuss the rationale behind the design of IBS and characterize its behavior relative to the SPEC benchmark suite. Our analysis is based on trace-driven and trap-driven simulations and takes into full account both the application and operating-system components of the workloads.This paper then reexamines a collection of previously-proposed hardware mechanisms for improving instruction-fetch performance in the context of the IBS workloads. We study the impact of cache organization, transfer bandwidth, prefetching, and pipelined memory systems on machines that rely on the use of relatively small primary instruction caches to facilitate increased clock rates. We find that, although of little use for SPEC, the right combination of these techniques substantially benefits IBS. Even so, under IBS, a stubborn lower bound on the instruction-fetch CPI remains as an obstacle to improving overall processor performance.