Cache performance of operating system and multiprogramming workloads
ACM Transactions on Computer Systems (TOCS)
Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems
IEEE Transactions on Computers
Accurate low-cost methods for performance evaluation of cache memory systems
Accurate low-cost methods for performance evaluation of cache memory systems
Characteristics of performance-optimal multi-level cache hierarchies
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Mache: no-loss trace compaction
SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
High-performance computer architecture (2nd ed.)
High-performance computer architecture (2nd ed.)
Efficient trace-driven simulation method for cache performance analysis
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Blocking: exploiting spatial locality for trace compaction
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A model for estimating trace-sample miss ratios
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Analysis of multi-megabyte secondary CPU cache memories
Analysis of multi-megabyte secondary CPU cache memories
Generation and analysis of very long address traces
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ACM Computing Surveys (CSUR)
Cold-start vs. warm-start miss ratios
Communications of the ACM
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Design and Evaluation of In-Cache Address Translation
Design and Evaluation of In-Cache Address Translation
Analysis of cache replacement-algorithms
Analysis of cache replacement-algorithms
Performance directed memory hierarchy design
Performance directed memory hierarchy design
Techniques for compressing program address traces
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Combining Trace Sampling with Single Pass Methods for Efficient Cache Simulation
IEEE Transactions on Computers
On the use of trace sampling for architectural studies of desktop applications
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Facilitating level three cache studies using set sampling
Proceedings of the 32nd conference on Winter simulation
Shared cache architectures for decision support systems
Performance Evaluation
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Simulation Based HPC Workload Analysis
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Benchmarks and Standards for the Evaluation of Parallel Job Schedulers
IPPS/SPDP '99/JSSPP '99 Proceedings of the Job Scheduling Strategies for Parallel Processing
Workload Modeling for Performance Evaluation
Performance Evaluation of Complex Systems: Techniques and Tools, Performance 2002, Tutorial Lectures
Variability in Architectural Simulations of Multi-Threaded Workloads
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
The workload on parallel supercomputers: modeling the characteristics of rigid jobs
Journal of Parallel and Distributed Computing
Efficient simulation of trace samples on parallel machines
Parallel Computing
How to use SimPoint to pick simulation points
ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Fast data-locality profiling of native execution
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A sample-based cache mapping scheme
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Replicating memory behavior for performance prediction
LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Optimal sample length for efficient cache simulation
Journal of Systems Architecture: the EUROMICRO Journal
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations
IEEE Transactions on Computers
Application of full-system simulation in exploratory system design and development
IBM Journal of Research and Development
SMA: a self-monitored adaptive cache warm-up scheme for microprocessor simulation
International Journal of Parallel Programming
Statistical sampling of microarchitecture simulation
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Yet shorter warmup by combining no-state-loss and MRRL for sampled LRU cache simulation
Journal of Systems and Software - Special issue: Quality software
NSL-BLRL: Efficient CacheWarmup for Sampled Processor Simulation
ANSS '06 Proceedings of the 39th annual Symposium on Simulation
A cache design for high performance embedded systems
Journal of Embedded Computing - Cache exploitation in embedded systems
Branch Predictor Warmup for Sampled Simulation through Branch History Matching
Transactions on High-Performance Embedded Architectures and Compilers II
Branch history matching: branch predictor warmup for sampled simulation
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Exploiting stability to reduce time-space cost for memory tracing
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Fast modeling of shared caches in multicore systems
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Efficient sampling startup for sampled processor simulation
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
DIMSim: a rapid two-level cache simulation approach for deadline-based MPSoCs
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Per-thread cycle accounting in multicore processors
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Reuse-based online models for caches
Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
Hi-index | 14.99 |
The paper compares the trace-sampling techniques of set sampling and time sampling. Using the multi-billion reference traces of A. Borg et al. (1990), we apply both techniques to multi-megabyte caches, where sampling is most valuable. We evaluate whether either technique meets a 10% sampling goal: a method meets this goal if, at least 90% of the time, it estimates the trace's true misses per instruction with /spl les/10% relative error using /spl les/10% of the trace. Results for these traces and caches show that set sampling meets the 10% sampling goal, while time sampling does not. We also find that cold-start bias in time samples is most effectively reduced by the technique of D.A. Wood et al. (1991). Nevertheless, overcoming cold-start bias requires tens of millions of consecutive references.