Proceedings of the 24th annual international symposium on Computer architecture
FLASH vs. (Simulated) FLASH: closing the simulation loop
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Advanced Computer Architectures
Advanced Computer Architectures
Processor Architecture: From Dataflow to Superscalar and Beyond
Processor Architecture: From Dataflow to Superscalar and Beyond
Measuring Experimental Error in Microprocessor Simulation
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
The PowerPC 604 RISC microprocessor
IEEE Micro
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
IEEE Micro
Workload Design: Selecting Representative Program-Input Pairs
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Increasing Instruction-Level Parallelism with Instruction Precomputation (Research Note)
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
The Alpha 21164PC Microprocessor
COMPCON '97 Proceedings of the 42nd IEEE International Computer Conference
A Statistically Rigorous Approach for Improving Simulation Methodology
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
The Alpha 21264 Microprocessor Architecture
ICCD '98 Proceedings of the International Conference on Computer Design
Improving Processor Performance by Simplifying and Bypassing Trivial Computations
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research
IEEE Computer Architecture Letters
Accurate and efficient regression modeling for microarchitectural performance and power prediction
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Methods of inference and learning for performance modeling of parallel applications
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Finding representative workloads for computer system design
Finding representative workloads for computer system design
Applied inference: Case studies in microarchitectural design
ACM Transactions on Architecture and Code Optimization (TACO)
Vision for liquid architecture
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
SubsetTrio: An evolutionary, geometric, and statistical benchmark subsetting framework
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Efficient design space exploration of GPGPU architectures
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
System performance evaluation by combining RTC and VHDL simulation: A case study on NICs
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 14.99 |
Due to cost, time, and flexibility constraints, computer architects use simulators to explore the design space when developing new processors and to evaluate the performance of potential enhancements. However, despite this dependence on simulators, statistically rigorous simulation methodologies are typically not used in computer architecture research. A formal methodology can provide a sound basis for drawing conclusions gathered from simulation results by adding statistical rigor and, consequently, can increase the architect's confidence in the simulation results. This paper demonstrates the application of a rigorous statistical technique to the setup and analysis phases of the simulation process. Specifically, we apply a Plackett and Burman design to: 1) identify key processor parameters, 2) classify benchmarks based on how they affect the processor, and 3) analyze the effect of processor enhancements. Our results showed that, out of the 41 user-configurable parameters in SimpleScalar, only 10 had a significant effect on the execution time. Of those 10, the number of reorder buffer entries and the L2 cache latency were the two most significant ones, by far. Our results also showed that Instruction Precomputation驴a value reuse-like microarchitectural technique驴 primarily improves the processor's performance by relieving integer ALU contention.