Efficient simulation of trace samples on parallel machines

Authors:
Lieven Eeckhout;Koen De Bosschere
Affiliations:
Department of Electronics and Information Systems (ELIS), Ghent University, Sint-Pietersnieuwstraat 41, Ghent B-9000, Belgium;Department of Electronics and Information Systems (ELIS), Ghent University, Sint-Pietersnieuwstraat 41, Ghent B-9000, Belgium
Venue:
Parallel Computing
Year:
2004

Citing 24
Cited 1

High-performance computer architecture

High-performance computer architecture
Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems

IEEE Transactions on Computers
A model for estimating trace-sample miss ratios

SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Systematic computer architecture prototyping

Systematic computer architecture prototyping
Effectiveness of trace sampling for performance debugging tools

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The PowerPC performance modeling methodology

Communications of the ACM
Combining Trace Sampling with Single Pass Methods for Efficient Cache Simulation

IEEE Transactions on Computers
Performance evaluation and validation of microprocessors

SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques

IEEE Transactions on Computers
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Performance Analysis and Its Impact on Design

Computer
Designing an Alpha Microprocessor

Computer
Guest Editors' Introduction: Challenges in Processor Modeling and Validation

IEEE Micro
Validating Trace-Driven Microarchitectural Simulations

IEEE Micro
A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches

IEEE Transactions on Computers
Reducing State Loss For Effective Trace Sampling of Superscalar Processors

ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
Accuracy and Speedup of Parallel Trace-Driven Architectural Simulation

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Representative Traces for Processor Models with Infinite Cache

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling

Proceedings of the 30th annual international symposium on Computer architecture
Minimal Subset Evaluation: Rapid Warm-Up for Simulated Hardware State

ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Memory Reference Reuse Latency: Accelerated Sampled Microarchitecture Simulation

Memory Reference Reuse Latency: Accelerated Sampled Microarchitecture Simulation
Accelerating Architectural Simulation by Parallel Execution of Trace Samples

Accelerating Architectural Simulation by Parallel Execution of Trace Samples
Memory reference reuse latency: Accelerated warmup for sampled microarchitecture simulation

ISPASS '03 Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software

Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Architectural simulations of microprocessors are extremely time-consuming nowadays due to the ever increasing complexity of current applications. In order to get realistic workloads for the architectural simulations, benchmarks need to constructed with huge dynamic instruction counts. For example, SPEC released the CPU2000 benchmark suite containing benchmarks that have a dynamic instruction count of several hundreds of billions of instructions. This is beneficial for real hardware evaluation. However, simulating these workloads is impractical if not impossible if we take into account that many simulation runs are needed in order to evaluate a large number of design points. Trace sampling is often proposed as a practical solution for this problem. In trace sampling, several representative samples are chosen from a real program trace. Since the sampled trace is much shorter than the original trace, a significant speedup is obtained. In this paper, we study how parallel processing can speedup the simulation of the sampled traces even further. Therefore, we propose and evaluate two ways of distributing sampled traces over parallel machines while taking into account the additional overhead due to the cold-start problem associated with trace sampling. We conclude that in many (practical) trace sampling scenarios, 'distributing samples' is more efficient than 'distributing traces'. This information will help researchers and computer designers in speeding up their simulation runs.