Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems
IEEE Transactions on Computers
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Execution-driven simulation of multiprocessors: address and timing analysis
ACM Transactions on Modeling and Computer Simulation (TOMACS)
HLS: combining statistical and symbolic simulation to guide microprocessor designs
Proceedings of the 27th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Reducing State Loss For Effective Trace Sampling of Superscalar Processors
ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
Modeling Superscalar Processors via Statistical Simulation
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
On the Predictability of Program Behavior Using Different Input Data Sets
INTERACT '02 Proceedings of the Sixth Annual Workshop on Interaction between Compilers and Computer Architectures
Minimal Subset Evaluation: Rapid Warm-Up for Simulated Hardware State
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Predictable performance in SMT processors
Proceedings of the 1st conference on Computing frontiers
Floorplanning optimization with trajectory piecewise-linear model for pipelined interconnects
Proceedings of the 41st annual Design Automation Conference
EXPERT: expedited simulation exploiting program behavior repetition
Proceedings of the 18th annual international conference on Supercomputing
Control Flow Modeling in Statistical Simulation for Accurate and Efficient Processor Design Studies
Proceedings of the 31st annual international symposium on Computer architecture
Efficient simulation of trace samples on parallel machines
Parallel Computing
MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Automatic Synthesis of High-Speed Processor Simulators
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
The Fuzzy Correlation between Code and Performance Predictability
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
How to use SimPoint to pick simulation points
ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Accelerated warmup for sampled microarchitecture simulation
ACM Transactions on Architecture and Code Optimization (TACO)
Fast data-locality profiling of native execution
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
TurboSMARTS: accurate microarchitecture simulation sampling in minutes
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Temporal Streaming of Shared Memory
Proceedings of the 32nd annual international symposium on Computer Architecture
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
A multinomial clustering model for fast simulation of computer architecture designs
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Online performance analysis by statistical sampling of microprocessor performance counters
Proceedings of the 19th annual international conference on Supercomputing
Improved automatic testcase synthesis for performance model validation
Proceedings of the 19th annual international conference on Supercomputing
"Flea-flicker" Multipass Pipelining: An Alternative to the High-Power Out-of-Order Offense
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
DBmbench: fast and accurate database workload representation on modern microarchitecture
CASCON '05 Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research
The RASE (Rapid, Accurate Simulation Environment) for chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining
IEEE Transactions on Computers
Optimal sample length for efficient cache simulation
Journal of Systems Architecture: the EUROMICRO Journal
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations
IEEE Transactions on Computers
Proceedings of the 33rd annual international symposium on Computer Architecture
Measuring Benchmark Similarity Using Inherent Program Characteristics
IEEE Transactions on Computers
Automatic logging of operating system effects to guide application-level architecture simulation
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
SMA: a self-monitored adaptive cache warm-up scheme for microprocessor simulation
International Journal of Parallel Programming
Predictable Performance in SMT Processors: Synergy between the OS and SMTs
IEEE Transactions on Computers
Efficient Sampling Startup for SimPoint
IEEE Micro
Proceedings of the 2006 international symposium on Low power electronics and design
Accurate and efficient regression modeling for microarchitectural performance and power prediction
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Efficiently exploring architectural design spaces via predictive modeling
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Phase guided sampling for efficient parallel application simulation
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
B2Sim:: a fast micro-architecture simulator based on basic block characterization
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Automatic performance model construction for the fast software exploration of new hardware designs
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Yet shorter warmup by combining no-state-loss and MRRL for sampled LRU cache simulation
Journal of Systems and Software - Special issue: Quality software
Accurate memory data flow modeling in statistical simulation
Proceedings of the 20th annual international conference on Supercomputing
A Sampling Method Focusing on Practicality
IEEE Micro
NSL-BLRL: Efficient CacheWarmup for Sampled Processor Simulation
ANSS '06 Proceedings of the 39th annual Symposium on Simulation
Fast compiler optimisation evaluation using code-feature based performance prediction
Proceedings of the 4th international conference on Computing frontiers
Mechanisms for store-wait-free multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Automated design of application specific superscalar processors: an analytical approach
Proceedings of the 34th annual international symposium on Computer architecture
Configurable isolation: building high availability systems with commodity multi-core processors
Proceedings of the 34th annual international symposium on Computer architecture
Microarchitecture Sensitive Empirical Models for Compiler Optimizations
Proceedings of the International Symposium on Code Generation and Optimization
Shadow Profiling: Hiding Instrumentation Costs with Parallelism
Proceedings of the International Symposium on Code Generation and Optimization
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
HySim: a fast simulation framework for embedded software development
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
A fast and generic hybrid simulation approach using C virtual machine
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Applying Statistical Sampling for Fast and Efficient Simulation of Commercial Workloads
IEEE Transactions on Computers
Speed versus Accuracy Trade-Offs in Microarchitectural Simulations
IEEE Transactions on Computers
Module assignment for pin-limited designs under the stacked-Vdd paradigm
Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
Efficient architectural design space exploration via predictive modeling
ACM Transactions on Architecture and Code Optimization (TACO)
IEEE Transactions on Computers
Efficiency trends and limits from comprehensive microarchitectural adaptivity
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Dispersing proprietary applications as benchmarks through code mutation
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
A superscalar simulation employing poisson distributed stalls
Computers and Electrical Engineering
Communications of the ACM - Web science
Multiprocessor performance estimation using hybrid simulation
Proceedings of the 45th annual Design Automation Conference
Improve simulation efficiency using statistical benchmark subsetting: an ImplantBench case study
Proceedings of the 45th annual Design Automation Conference
Performance scalability of decoupled software pipelining
ACM Transactions on Architecture and Code Optimization (TACO)
Distilling the essence of proprietary workloads into miniature benchmarks
ACM Transactions on Architecture and Code Optimization (TACO)
Automated module assignment in stacked-Vdd designs for high-efficiency power delivery
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Multi-granularity sampling for simulating concurrent heterogeneous applications
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Analysing and improving clustering based sampling for microprocessor simulation
International Journal of High Performance Computing and Networking
COTSon: infrastructure for full system simulation
ACM SIGOPS Operating Systems Review
Finding Stress Patterns in Microprocessor Workloads
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Evaluating Sampling Based Hotspot Detection
ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
Branch Predictor Warmup for Sampled Simulation through Branch History Matching
Transactions on High-Performance Embedded Architectures and Compilers II
Precise simulation of interrupts using a rollback mechanism
Proceedings of th 12th International Workshop on Software and Compilers for Embedded Systems
Spatio-temporal memory streaming
Proceedings of the 36th annual international symposium on Computer architecture
Hybrid Techniques for Fast Multicore Simulation
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Trace-driven workload simulation method for Multiprocessor System-On-Chips
Proceedings of the 46th Annual Design Automation Conference
Simple and fast micro-architecture simulation: a trisection cantor fractal approach
ACM SIGMETRICS Performance Evaluation Review
Machine learning-based prefetch optimization for data center applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Software—Practice & Experience
Branch history matching: branch predictor warmup for sampled simulation
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Using dynamic binary instrumentation to generate multi-platform SimPoints: methodology and accuracy
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Architecture performance prediction using evolutionary artificial neural networks
Evo'08 Proceedings of the 2008 conference on Applications of evolutionary computing
Rapid early-stage microarchitecture design using predictive models
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Enigma: architectural and operating system support for reducing the impact of address translation
Proceedings of the 24th ACM International Conference on Supercomputing
Adaptive simulation sampling using an autoregressive framework
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Automated modeling and emulation of interconnect designs for many-core chip multiprocessors
Proceedings of the 47th Design Automation Conference
Applied inference: Case studies in microarchitectural design
ACM Transactions on Architecture and Code Optimization (TACO)
Criticality-driven superscalar design space exploration
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Detailed performance analysis using coarse grain sampling
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Statistical sampling of microarchitecture simulation
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
SubsetTrio: An evolutionary, geometric, and statistical benchmark subsetting framework
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Fast modeling of shared caches in multicore systems
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
The shape of the processor design space and its implications for early stage explorations
ACMOS'05 Proceedings of the 7th WSEAS international conference on Automatic control, modeling and simulation
Soft error benchmarking of L2 caches with PARMA
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Soft error benchmarking of L2 caches with PARMA
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Pruning hardware evaluation space via correlation-driven application similarity analysis
Proceedings of the 8th ACM International Conference on Computing Frontiers
Proceedings of the 8th ACM International Conference on Computing Frontiers
Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs
ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic access distance driven cache replacement
ACM Transactions on Architecture and Code Optimization (TACO)
Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
On the simulation of large-scale architectures using multiple application abstraction levels
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Efficient sampling startup for sampled processor simulation
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Enhancing network processor simulation speed with statistical input sampling
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
A fast MPSoC virtual prototyping for intensive signal processing applications
Microprocessors & Microsystems
Finding extreme behaviors in microprocessor workloads
Transactions on High-Performance Embedded Architectures and Compilers IV
Statistical Performance Modeling in Functional Instruction Set Simulators
ACM Transactions on Embedded Computing Systems (TECS)
Studying hardware and software trade-offs for a real-life web 2.0 workload
ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
Link-time optimization for power efficiency in a tagless instruction cache
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Extracting the optimal sampling frequency of applications using spectral analysis
Concurrency and Computation: Practice & Experience
Thermal-aware sampling in architectural simulation
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Power-aware multi-core simulation for early design stage hardware/software co-optimization
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
DIMSim: a rapid two-level cache simulation approach for deadline-based MPSoCs
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Microarchitectural design space exploration made fast
Microprocessors & Microsystems
Accurately modeling superscalar processor performance with reduced trace
Journal of Parallel and Distributed Computing
Composite Cores: Pushing Heterogeneity Into a Core
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Inferred Models for Dynamic and Sparse Hardware-Software Spaces
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Accelerating GPGPU architecture simulation
Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
A survey on cache tuning from a power/energy perspective
ACM Computing Surveys (CSUR)
Proceedings of the ACM International Conference on Computing Frontiers
Proceedings of the 40th Annual International Symposium on Computer Architecture
ZSim: fast and accurate microarchitectural simulation of thousand-core systems
Proceedings of the 40th Annual International Symposium on Computer Architecture
Task sampling: computer architecture simulation in the many-core era
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
SHIFT: shared history instruction fetch for lean-core server processors
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Multi-grain coherence directories
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Trace based phase prediction for tightly-coupled heterogeneous cores
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Meet the walkers: accelerating index traversals for in-memory databases
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Warm-Up Simulation Methodology for HW/SW Co-Designed Processors
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
WatchdogLite: Hardware-Accelerated Compiler-Based Pointer Checking
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
PCantorSim: Accelerating parallel architecture simulation through fractal-based sampling
ACM Transactions on Architecture and Code Optimization (TACO)
A queueing theoretic approach for performance evaluation of low-power multi-core embedded systems
Journal of Parallel and Distributed Computing
Mesoscale performance simulation of multicore processor systems
Software and Systems Modeling (SoSyM)
Hi-index | 0.02 |
Current software-based microarchitecture simulators are many orders of magnitude slower than the hardware they simulate. Hence, most microarchitecture design studies draw their conclusions from drastically truncated benchmark simulations that are often inaccurate and misleading. This paper presents the Sampling Microarchitecture Simulation (SMARTS) framework as an approach to enable fast and accurate performance measurements of full-length benchmarks. SMARTS accelerates simulation by selectively measuring in detail only an appropriate benchmark subset. SMARTS prescribes a statistically sound procedure for configuring a systematic sampling simulation run to achieve a desired quantifiable confidence in estimates.Analysis of 41 of the 45 possible SPEC2K benchmark/input combinations show CPI and energy per instruction (EPI) can be estimated to within ±3% with 99.7% confidence by measuring fewer than 50 million instructions per benchmark. In practice, inaccuracy in microarchitectural state initialization introduces an additional uncertainty which we empirically bound to ∼2% for the tested benchmarks. Our implementation of SMARTS achieves an actual average error of only 0.64% on CPI and 0.59% on EPI for the tested benchmarks, running with average speedups of 35 and 60 over detailed simulation of 8-way and 16-way out-of-order processors, respectively.