Cache performance of operating system and multiprogramming workloads

Authors:
Anant Agarwal;John Hennessy;Mark Horowitz
Affiliations:
Computer Systems Laboratory, Stanford, CA;Computer Systems Laboratory, Stanford, CA;Computer Systems Laboratory, Stanford, CA
Venue:
ACM Transactions on Computer Systems (TOCS)
Year:
1988

Citing 21
Cited 89

An Empirical Study of Task Switching Locality in MVS

IEEE Transactions on Computers
Design Decisions in SPUR

Computer
ATUM: a new technique for capturing address traces using microcode

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
A high-performance memory management scheme

Computer
Analysis of cache performance for operating systems and multiprogramming

Analysis of cache performance for operating systems and multiprogramming
Multiprocessor cache analysis using ATUM

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Performance of the VAX-11/780 translation buffer: simulation and measurement

ACM Transactions on Computer Systems (TOCS)
Cache evaluation and the impact of workload choice

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Cache Memories

ACM Computing Surveys (CSUR)
Cache Performance in the VAX-11/780

ACM Transactions on Computer Systems (TOCS)
Transient behavior of cache memories

ACM Transactions on Computer Systems (TOCS)
Cold-start vs. warm-start miss ratios

Communications of the ACM
The working set model for program behavior

Communications of the ACM
Cache memory performance in a unix enviroment

ACM SIGARCH Computer Architecture News
A study of instruction cache organizations and replacement policies

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
The 801 minicomputer

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
Cache memories for PDP-11 family computers

ISCA '76 Proceedings of the 3rd annual symposium on Computer architecture
Cache hit ratios with geometric task switch intervals

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
An instruction timing model of CPU performance

ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
MIPS-X: the external interface

MIPS-X: the external interface
The Memory Architecture and the Cache and Memory Management Unit for

The Memory Architecture and the Cache and Memory Management Unit for

The effect of sharing on the cache and bus performance of parallel programs

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Evaluating the performance of four snooping cache coherency protocols

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Inexpensive implementations of set-associativity

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Evaluating Associativity in CPU Caches

IEEE Transactions on Computers
Blocking: exploiting spatial locality for trace compaction

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The effects of processor architecture on instruction memory traffic

ACM Transactions on Computer Systems (TOCS)
The effect of context switches on cache performance

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The interaction of architecture and operating system design

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
User-level interprocess communication for shared memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Evaluating Design Choices for Shared Bus Multiprocessors in a Throughput-Oriented Environment

IEEE Transactions on Computers
A cache multitasking model

ACM SIGMETRICS Performance Evaluation Review
Characterizing the caching and synchronization performance of a multiprocessor operating system

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Empirical performance evaluation of concurrency and coherency control protocols for database sharing systems

ACM Transactions on Database Systems (TODS)
Design tradeoffs for software-managed TLBs

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A case for two-way skewed-associative caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The impact of operating system structure on memory system performance

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Maya: a simulation platform for distributed shared memories

PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
Design tradeoffs for software-managed TLBs

ACM Transactions on Computer Systems (TOCS)
Optimal allocation of on-chip memory for multiple-API operating systems

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Trap-driven simulation with Tapeworm II

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Contrasting characteristics and cache performance of technical and multi-user commercial workloads

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A new page table for 64-bit address spaces

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Instruction fetching: coping with code bloat

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Improving single-process performance with multithreaded processors

ICS '96 Proceedings of the 10th international conference on Supercomputing
Trap-driven memory simulation with Tapeworm II

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Trace-driven memory simulation: a survey

ACM Computing Surveys (CSUR)
Eliminating cache conflict misses through XOR-based placement functions

ICS '97 Proceedings of the 11th international conference on Supercomputing
Remembrance of things past: locality and memory in BDDs

DAC '97 Proceedings of the 34th annual Design Automation Conference
The design and performance of a conflict-avoiding cache

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Capturing dynamic memory reference behavior with adaptive cache topology

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Functional Implementation Techniques for CPU Cache Memories

IEEE Transactions on Computers - Special issue on cache memory and related problems
Optimizing the Instruction Cache Performance of the Operating System

IEEE Transactions on Computers
Comprehensive Hardware and Software Support for Operating Systems to Exploit MP Memory Hierarchies

IEEE Transactions on Computers
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
An analysis of operating system behavior on a simultaneous multithreaded architecture

ACM SIGPLAN Notices
Cache performance for multimedia applications

ICS '01 Proceedings of the 15th international conference on Supercomputing
An analysis of operating system behavior on a simultaneous multithreaded architecture

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
L1 data cache decomposition for energy efficiency

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Improving Performance of Large Physically Indexed Caches by Decoupling Memory Addresses from Cache Addresses

IEEE Transactions on Computers
Choosing representative slices of program execution for microarchitecture simulations: a preliminary application to the data stream

Workload characterization of emerging computer applications
Understanding the impact of X86/NT computing on microarchitecture

Workload characterization of emerging computer applications
Microprocessor Memory Management Units

IEEE Micro
Cache Performance of the SPEC92 Benchmark Suite

IEEE Micro
Two Fast and High-Associativity Cache Schemes

IEEE Micro
A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches

IEEE Transactions on Computers
Editorial

IEEE Transactions on Computers
An Analysis of Cache Performance for a Hypercube Multicomputer

IEEE Transactions on Parallel and Distributed Systems
Performance Tradeoffs in Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
Peppermint and Sled: Tools for Evaluating SMP Systems Based on IA-64 (IPF) Processors

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Partitioned first-level cache design for clustered microarchitectures

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
A Design Frame for Hybrid Access Cashes

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Optimizing instruction cache performance for operating system intensive workloads

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Improving the Data Cache Performance of Multiprocessor Operating Systems

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Predictive sequential associative cache

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
The Inaccuracy of Trace-Driven Simulation Using Incomplete Multiprogramming Trace Data

MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
The impact of extrinsic cache performance on predictability of real-time systems

RTCSA '95 Proceedings of the 2nd International Workshop on Real-Time Computing Systems and Applications
Efficient trace-sampling simulation techniques for cache performance analysis

SS '96 Proceedings of the 29th Annual Simulation Symposium (SS '96)
An evaluation of speculative instruction execution on simultaneous multithreaded processors

ACM Transactions on Computer Systems (TOCS)
IPStash: a Power-Efficient Memory Architecture for IP-lookup

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Location cache: a low-power L2 cache system

Proceedings of the 2004 international symposium on Low power electronics and design
Hierarchical Binary Set Partitioning in Cache Memories

The Journal of Supercomputing
The V-Way Cache: Demand Based Associativity via Global Replacement

Proceedings of the 32nd annual international symposium on Computer Architecture
Statistical sampling of microarchitecture simulation

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Making a case for split data caches for embedded applications

MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Evaluate the performance changes of processor simulator benchmarks When context switches are incorporated

Proceedings of the 2006 annual ACM SIGAda international conference on Ada
Compression in cache design

Proceedings of the 21st annual international conference on Supercomputing
Quantifying the cost of context switch

Proceedings of the 2007 workshop on Experimental computer science
Eliminating inter-process cache interference through cache reconfigurability for real-time and low-power embedded multi-tasking systems

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Tiny split data-caches make big performance impact for embedded applications

Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Characterizing and modeling the behavior of context switch misses

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A novel cache architecture with enhanced performance and security

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Factored operating systems (fos): the case for a scalable operating system for multicores

ACM SIGOPS Operating Systems Review
OS execution on multi-cores: is out-sourcing worthwhile?

ACM SIGOPS Operating Systems Review
A framework for programmable overlay multimedia networks

IBM Journal of Research and Development
Cache partitioning for energy-efficient and interference-free embedded multitasking

ACM Transactions on Embedded Computing Systems (TECS)
Trace Cache Miss Rate

International Journal of Modelling and Simulation
Context-aware TLB preloading for interference reduction in embedded multi-tasked systems

Proceedings of the 20th symposium on Great lakes symposium on VLSI
A new TCB cache to efficiently manage TCP sessions for web servers

Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Understanding the behavior and implications of context switch misses

ACM Transactions on Architecture and Code Optimization (TACO)
FlexSC: flexible system call scheduling with exception-less system calls

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
All-window profiling and composable models of cache sharing

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Exception-less system calls for event-driven servers

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
MyUT: Design and implementation of efficient user-level thread management for improving cache utilization

ICCSA'11 Proceedings of the 2011 international conference on Computational science and Its applications - Volume Part V
Soft error mitigation in cache memories of embedded systems by means of a protected scheme

LADC'05 Proceedings of the Second Latin-American conference on Dependable Computing
Trace-Based data layout optimizations for multi-core processors

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Improving server performance on multi-cores via selective off-loading of OS functionality

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
COSMIC: middleware for high performance and reliable multiprocessing on xeon phi coprocessors

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing

Quantified Score

Hi-index	0.02

Visualization

Abstract

Large caches are necessary in current high-performance computer systems to provide the required high memory bandwidth. Because a small decrease in cache performance can result in significant system performance degradation, accurately characterizing the performance of large caches is important. Although measurements on actual systems have shown that operating systems and multiprogramming can affect cache performance, previous studies have not focused on these effects. We have developed a program tracing technique called ATUM (Address Tracing Using Microcode) that captures realistic traces of multitasking workloads including the operating system. Examining cache behavior using these traces from a VAX processor shows that both the operating system and multiprogramming activity significantly degrade cache performance, with an even greater proportional impact on large caches. From a careful analysis of the causes of this degradation, we explore various techniques to reduce this loss. While seemingly little can be done to mitigate the effect of system references, multitasking cache miss activity can be substantially reduced with small hardware additions.