A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches

Authors:
R. E. Kessler;M. D. Hill;D. A. Wood
Affiliations:
-;-;-
Venue:
IEEE Transactions on Computers
Year:
1994

Citing 19
Cited 33

Cache performance of operating system and multiprogramming workloads

ACM Transactions on Computer Systems (TOCS)
Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems

IEEE Transactions on Computers
Accurate low-cost methods for performance evaluation of cache memory systems

Accurate low-cost methods for performance evaluation of cache memory systems
Characteristics of performance-optimal multi-level cache hierarchies

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Mache: no-loss trace compaction

SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Evaluating Associativity in CPU Caches

IEEE Transactions on Computers
High-performance computer architecture (2nd ed.)

High-performance computer architecture (2nd ed.)
Efficient trace-driven simulation method for cache performance analysis

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Blocking: exploiting spatial locality for trace compaction

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A model for estimating trace-sample miss ratios

SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Analysis of multi-megabyte secondary CPU cache memories

Analysis of multi-megabyte secondary CPU cache memories
Generation and analysis of very long address traces

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache Memories

ACM Computing Surveys (CSUR)
Cold-start vs. warm-start miss ratios

Communications of the ACM
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Design and Evaluation of In-Cache Address Translation

Design and Evaluation of In-Cache Address Translation
Analysis of cache replacement-algorithms

Analysis of cache replacement-algorithms
Performance directed memory hierarchy design

Performance directed memory hierarchy design

Techniques for compressing program address traces

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Combining Trace Sampling with Single Pass Methods for Efficient Cache Simulation

IEEE Transactions on Computers
On the use of trace sampling for architectural studies of desktop applications

SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Facilitating level three cache studies using set sampling

Proceedings of the 32nd conference on Winter simulation
Shared cache architectures for decision support systems

Performance Evaluation
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Validating Trace-Driven Microarchitectural Simulations

IEEE Micro
Simulation Based HPC Workload Analysis

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Benchmarks and Standards for the Evaluation of Parallel Job Schedulers

IPPS/SPDP '99/JSSPP '99 Proceedings of the Job Scheduling Strategies for Parallel Processing
Workload Modeling for Performance Evaluation

Performance Evaluation of Complex Systems: Techniques and Tools, Performance 2002, Tutorial Lectures
Variability in Architectural Simulations of Multi-Threaded Workloads

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
The workload on parallel supercomputers: modeling the characteristics of rigid jobs

Journal of Parallel and Distributed Computing
Efficient simulation of trace samples on parallel machines

Parallel Computing
How to use SimPoint to pick simulation points

ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Fast data-locality profiling of native execution

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A sample-based cache mapping scheme

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Replicating memory behavior for performance prediction

LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Optimal sample length for efficient cache simulation

Journal of Systems Architecture: the EUROMICRO Journal
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations

IEEE Transactions on Computers
Application of full-system simulation in exploratory system design and development

IBM Journal of Research and Development
SMA: a self-monitored adaptive cache warm-up scheme for microprocessor simulation

International Journal of Parallel Programming
Statistical sampling of microarchitecture simulation

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Yet shorter warmup by combining no-state-loss and MRRL for sampled LRU cache simulation

Journal of Systems and Software - Special issue: Quality software
NSL-BLRL: Efficient CacheWarmup for Sampled Processor Simulation

ANSS '06 Proceedings of the 39th annual Symposium on Simulation
A cache design for high performance embedded systems

Journal of Embedded Computing - Cache exploitation in embedded systems
Branch Predictor Warmup for Sampled Simulation through Branch History Matching

Transactions on High-Performance Embedded Architectures and Compilers II
Branch history matching: branch predictor warmup for sampled simulation

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Exploiting stability to reduce time-space cost for memory tracing

ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Fast modeling of shared caches in multicore systems

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Efficient sampling startup for sampled processor simulation

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
DIMSim: a rapid two-level cache simulation approach for deadline-based MPSoCs

Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Per-thread cycle accounting in multicore processors

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Reuse-based online models for caches

Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems

Quantified Score

Hi-index	14.99

Visualization

Abstract

The paper compares the trace-sampling techniques of set sampling and time sampling. Using the multi-billion reference traces of A. Borg et al. (1990), we apply both techniques to multi-megabyte caches, where sampling is most valuable. We evaluate whether either technique meets a 10% sampling goal: a method meets this goal if, at least 90% of the time, it estimates the trace's true misses per instruction with /spl les/10% relative error using /spl les/10% of the trace. Results for these traces and caches show that set sampling meets the 10% sampling goal, while time sampling does not. We also find that cold-start bias in time samples is most effectively reduced by the technique of D.A. Wood et al. (1991). Nevertheless, overcoming cold-start bias requires tens of millions of consecutive references.