IEEE Transactions on Computers
Tradeoffs in two-level on-chip caching
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A performance comparison of contemporary DRAM architectures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
FLASH vs. (Simulated) FLASH: closing the simulation loop
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Complete Computer System Simulation: The SimOS Approach
IEEE Parallel & Distributed Technology: Systems & Technology
The Alpha 21264 Microprocessor
IEEE Micro
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
The Alpha 21264 Microprocessor Architecture
ICCD '98 Proceedings of the International Conference on Computer Design
SimICS/sun4m: a virtual workstation
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
RSIM: a simulator for shared-memory multiprocessor and uniprocessor systems that exploit ILP
WCAE-3 '97 Proceedings of the 1997 workshop on Computer architecture education
Errata on "Measuring Experimental Error in Microprocessor Simulation"
ACM SIGARCH Computer Architecture News
Full-system timing-first simulation
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Teaching computer organization/architecture with limited resources using simulators
SIGCSE '02 Proceedings of the 33rd SIGCSE technical symposium on Computer science education
Modeling assembly instruction timing in superscalar architectures
Proceedings of the 15th international symposium on System Synthesis
The cache behaviour of large lazy functional programs on stock hardware
Proceedings of the 2002 workshop on Memory system performance
Microarchitectural exploration with Liberty
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic binary translation for accumulator-oriented architectures
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Variability in Architectural Simulations of Multi-Threaded Workloads
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A Statistically Rigorous Approach for Improving Simulation Methodology
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Microprocessor pipeline energy analysis
Proceedings of the 2003 international symposium on Low power electronics and design
How java programs interact with virtual machines at the microarchitectural level
OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
Scalable Hardware Memory Disambiguation for High ILP Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Leakage Energy Reduction in Register Renaming
ICDCSW '04 Proceedings of the 24th International Conference on Distributed Computing Systems Workshops - W7: EC (ICDCSW'04) - Volume 7
Journal of Systems and Software - Special issue: Performance modeling and analysis of computer systems and networks
Scalable selective re-execution for EDGE architectures
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Toward an Evaluation Infrastructure for Power and Energy Optimizations
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 11 - Volume 12
Recent extensions to the SimpleScalar tool suite
ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
The Liberty Simulation Environment, version 1.0
ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
An in-depth look at computer performance growth
ACM SIGARCH Computer Architecture News - Special issue: Workshop on architectural support for security and anti-virus (WASSA)
Owl: next generation system monitoring
Proceedings of the 2nd conference on Computing frontiers
Improved automatic testcase synthesis for performance model validation
Proceedings of the 19th annual international conference on Supercomputing
Improving Computer Architecture Simulation Methodology by Adding Statistical Rigor
IEEE Transactions on Computers
Evaluating the impact of the simulation environment on experimentation results
Performance Evaluation
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations
IEEE Transactions on Computers
Decomposing memory performance: data structures and phases
Proceedings of the 5th international symposium on Memory management
The design and utility of the ML-RSIM system simulator
Journal of Systems Architecture: the EUROMICRO Journal
The Liberty Simulation Environment: A deliberate approach to high-level system modeling
ACM Transactions on Computer Systems (TOCS)
Achieving structural and composable modeling of complex systems
International Journal of Parallel Programming - Special issue: The next generation software program
A Predictive Performance Model for Superscalar Processors
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Late-binding: enabling unordered load-store queues
Proceedings of the 34th annual international symposium on Computer architecture
RiceNIC: a reconfigurable network interface for experimental research and education
Proceedings of the 2007 workshop on Experimental computer science
Speed versus Accuracy Trade-Offs in Microarchitectural Simulations
IEEE Transactions on Computers
The FAST methodology for high-speed SoC/computer simulation
Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
The worst-case execution-time problem—overview of methods and survey of tools
ACM Transactions on Embedded Computing Systems (TECS)
Accurate system-level performance modeling and workload characterization for mobile internet devices
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Dynamic voltage and frequency scaling for scientific applications
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Automatic performance model synthesis from hardware verification models
Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
Comparing low-level behavior of SPEC CPU and java workloads
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Hi-index | 0.01 |
Abstract: We measure the experimental error that arises from the use of non-validated simulators in computer architecture research, with the goal of increasing the rigor of simulation- based studies. We describe the methodology that we used to validate a microprocessor simulator against a Compaq DS-10L workstation, which contains an Alpha 21264 processor. Our evaluation suite consists of a set of 21 microbenchmarks that stress different aspects of the 21264 microarchitecture. Using the microbenchmark suite as the set of workloads, we describe how we reduced our simulator error to an arithmetic mean of 2%, and include details about the specific aspects of the pipeline that required extra care to reduce the error. We show how these low-level optimizations reduce average error from 40% to less than 20% on macrobenchmarks drawn from the SPEC2000 suite. Finally, we examine the degree to which performance optimizations are stable across different simulators, showing that researchers would draw different conclusions, in some cases, if using validated simulators.