The RASE (Rapid, Accurate Simulation Environment) for chip multiprocessors

Authors:
John D. Davis;Cong Fu;James Laudon
Affiliations:
Sun Microsystems;Sun Microsystems;Sun Microsystems
Venue:
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Year:
2005

Citing 20
Cited 4

Interleaving: a multithreading technique targeting multiprocessors and workstations

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Memory system characterization of commercial workloads

Proceedings of the 25th annual international symposium on Computer architecture
An analysis of database workload performance on simultaneous multithreaded processors

Proceedings of the 25th annual international symposium on Computer architecture
Performance of database workloads on shared-memory systems with out-of-order processors

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
FLASH vs. (Simulated) FLASH: closing the simulation loop

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Simics: A Full System Simulation Platform

Computer
System Optimization for OLTP Workloads

IEEE Micro
Validating Trace-Driven Microarchitectural Simulations

IEEE Micro
Simulating a $2M Commercial Server on a $2K PC

Computer
Peppermint and Sled: Tools for Evaluating SMP Systems Based on IA-64 (IPF) Processors

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Exploring the Design Space of Future CMPs

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Variability in Architectural Simulations of Multi-Threaded Workloads

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A comparative study of conservative and optimistic trace-driven simulations

SS '95 Proceedings of the 28th Annual Simulation Symposium
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling

Proceedings of the 30th annual international symposium on Computer architecture
Phase tracking and prediction

Proceedings of the 30th annual international symposium on Computer architecture
The Fuzzy Correlation between Code and Performance Predictability

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Chip Multithreading: Opportunities and Challenges

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Maximizing CMP Throughput with Mediocre Cores

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A performance methodology for commercial servers

IBM Journal of Research and Development

Modeling and analysis of core-centric network processors

ACM Transactions on Embedded Computing Systems (TECS)
Modeling and analysis of core-centric network processors

ACM Transactions on Embedded Computing Systems (TECS)
Multicore architectures with dynamically reconfigurable array processors for wireless broadband technologies

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Thread allocation in CMP-based multithreaded network processors

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present RASE, a full system high performance simulation methodology for simulating complex server applications and server class chip multiprocessors enabled with fine-grain multithreading (CMTs). RASE combines application knowledge, operating system information, and data access patterns with an instruction stream from a highly-tuned, scalable steady-state benchmark [5] [22] to generate multiple representative instruction streams that can be mapped to a variety of CMT configurations. We use execution-driven simulation to generate instruction streams for M processors and store them as instruction trace files (several billion instructions per processor) that can be post-processed and augmented for larger than M processor system simulation. We use SPEC JBB2000, TPC-C, and an XML server benchmark to compare the performance estimates of RASE to a reference prototype CMT system. By varying M, we find that our trace-driven simulation methodology predicts within 5% of the instructions per cycle (IPC) of the reference hardware for the applications. Without post-processing the traces, in the best cases, the performance prediction accuracy degrades to 20-40% of the real IPC for instruction traces that require a high replication factor.