Efficient simulation of caches under optimal replacement with applications to miss characterization
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Complexity/performance tradeoffs with non-blocking loads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
HLS: combining statistical and symbolic simulation to guide microprocessor designs
Proceedings of the 27th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Computer
Reducing State Loss For Effective Trace Sampling of Superscalar Processors
ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Modeling Superscalar Processors via Statistical Simulation
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Representative Traces for Processor Models with Infinite Cache
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A Framework for Statistical Modeling of Superscalar Processor Performance
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling
Proceedings of the 30th annual international symposium on Computer architecture
Picking Statistically Valid and Early Simulation Points
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
A First-Order Superscalar Processor Model
Proceedings of the 31st annual international symposium on Computer architecture
Control Flow Modeling in Statistical Simulation for Accurate and Efficient Processor Design Studies
Proceedings of the 31st annual international symposium on Computer architecture
Improved automatic testcase synthesis for performance model validation
Proceedings of the 19th annual international conference on Supercomputing
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research
IEEE Computer Architecture Letters
Efficient design space exploration of high performance embedded out-of-order processors
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Accurate memory data flow modeling in statistical simulation
Proceedings of the 20th annual international conference on Supercomputing
Microprocessor power estimation using profile-driven program synthesis
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Distilling the essence of proprietary workloads into miniature benchmarks
ACM Transactions on Architecture and Code Optimization (TACO)
Accurately modeling superscalar processor performance with reduced trace
Journal of Parallel and Distributed Computing
Hi-index | 14.98 |
Microprocessor design is both complex and time-consuming: exploring a huge design space for identifying the optimal design under a number of constraints is infeasible using detailed architectural simulation of entire benchmark executions. Statistical simulation is a recently introduced approach for efficiently culling the microprocessor design space. The basic idea of statistical simulation is to collect a number of important program characteristics and to generate a synthetic trace from it. Simulating this synthetic trace is extremely fast as it contains a million instructions only. This paper improves the statistical simulation methodology by proposing accurate memory data flow models. We propose (i) cache miss correlation, or measuring cache statistics conditionally dependent on the global cache hit/miss history, for modeling cache miss patterns and memory-level parallelism, (ii) cache line reuse distributions for modeling accesses to outstanding cache lines, and (iii) through-memory read-after-write dependency distributions for modeling load forwarding and bypassing. Our experiments using the SPEC CPU2000 benchmarks show substantial improvements compared to current state-of-the-art statistical simulation methods. For example, for our baseline configuration, we reduce the average IPC prediction error from 10.9% to 2.1%; the maximum error observed equals 5.8%.