A model for estimating trace-sample miss ratios

Authors:
David A. Wood;Mark D. Hill;R. E. Kessler
Affiliations:
-;-;-
Venue:
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Year:
1991

Citing 5
Cited 41

Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems

IEEE Transactions on Computers
High-performance computer architecture (2nd ed.)

High-performance computer architecture (2nd ed.)
Generation and analysis of very long address traces

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache Memories

ACM Computing Surveys (CSUR)
Cold-start vs. warm-start miss ratios

Communications of the ACM

Effectiveness of trace sampling for performance debugging tools

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Active memory: a new abstraction for memory-system simulation

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A quantitative analysis of loop nest locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trap-driven memory simulation with Tapeworm II

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Active memory: a new abstraction for memory system simulation

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Trace-driven memory simulation: a survey

ACM Computing Surveys (CSUR)
Static timing analysis of embedded software

DAC '97 Proceedings of the 34th annual Design Automation Conference
On the use of trace sampling for architectural studies of desktop applications

SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques

IEEE Transactions on Computers
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks

ACM Transactions on Computer Systems (TOCS)
Exact analysis of the cache behavior of nested loops

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Dead-block prediction & dead-block correlating prefetchers

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Cache decay: exploiting generational behavior to reduce cache leakage power

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Let caches decay: reducing leakage energy via exploitation of cache generational behavior

ACM Transactions on Computer Systems (TOCS)
Choosing representative slices of program execution for microarchitecture simulations: a preliminary application to the data stream

Workload characterization of emerging computer applications
Timekeeping in the memory system: predicting and optimizing memory behavior

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Shared cache architectures for decision support systems

Performance Evaluation
A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches

IEEE Transactions on Computers
Cache-Line Decay: A Mechanism to Reduce Cache Leakage Power

PACS '00 Proceedings of the First International Workshop on Power-Aware Computer Systems-Revised Papers
Efficient simulation of trace samples on parallel machines

Parallel Computing
How to use SimPoint to pick simulation points

ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Accelerated warmup for sampled microarchitecture simulation

ACM Transactions on Architecture and Code Optimization (TACO)
Fast data-locality profiling of native execution

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Replicating memory behavior for performance prediction

LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Computing Architectural Vulnerability Factors for Address-Based Structures

Proceedings of the 32nd annual international symposium on Computer Architecture
Counter-Based Cache Replacement Algorithms

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Optimal sample length for efficient cache simulation

Journal of Systems Architecture: the EUROMICRO Journal
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations

IEEE Transactions on Computers
Reducing leakage power in instruction cache using WDC for embedded processors

Proceedings of the 2005 Asia and South Pacific Design Automation Conference
SMA: a self-monitored adaptive cache warm-up scheme for microprocessor simulation

International Journal of Parallel Programming
Statistical sampling of microarchitecture simulation

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Microarchitecture-level leakage reduction with data retention

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
NSL-BLRL: Efficient CacheWarmup for Sampled Processor Simulation

ANSS '06 Proceedings of the 39th annual Symposium on Simulation
Branch Predictor Warmup for Sampled Simulation through Branch History Matching

Transactions on High-Performance Embedded Architectures and Compilers II
Divide-and-conquer: a bubble replacement for low level caches

Proceedings of the 23rd international conference on Supercomputing
Architecture Design for Soft Errors

Architecture Design for Soft Errors
Branch history matching: branch predictor warmup for sampled simulation

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Exploiting stability to reduce time-space cost for memory tracing

ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Efficient sampling startup for sampled processor simulation

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Combining recency of information with selective random and a victim cache in last-level caches

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.01

Visualization

Abstract

Unknown references, also known as cold-start misses, arise during trace-driven simulation of uniprocessor caches because of the unknown initial conditions. Accurately estimating the miss ratio of unknown references, denoted by μ, is particularly important when simulating large caches with short trace samples, since many references may be unknown.In this paper we make three contributions regarding μ. First, we provide empirical evidence that μ is much larger than the overall miss ratio (e.g., 0.40 vs. 0.02). Prior work suggests that they should be the same. Second, we develop a model that explains our empirical results for long trace samples. In our model, each block frame is either live, if its next reference will hit, or dead, if its next reference will miss. We model each block frame as an alternating renewal process, and use the renewal-reward theorem to show that μ is simply the fraction of time block frames are dead. Finally, we extend the model to handle short trace samples and use it to develop several estimators of μ. Trace-driven simulation results show these estimators lead to better estimates of overall miss ratios than do previous methods.