Parallel simulation of chip-multiprocessor architectures

Authors:
Matthew Chidester;Alan George
Affiliations:
Intel Corporation, Hillsboro, OR;University of Florida, Gainesville, FL
Venue:
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Year:
2002

Citing 23
Cited 17

Reducing Null Messages in Misra's Distributed Discrete Event Simulation Method

IEEE Transactions on Software Engineering
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Distributed simulation of parallel DSP architectures on workstation clusters

Simulation
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Modeling cost/performance of a parallel computer simulator

ACM Transactions on Modeling and Computer Simulation (TOMACS)
The M-Machine multicomputer

International Journal of Parallel Programming - Special issue on instruction-level parallel processing—part II
Hardware and software support for speculative execution of sequential binaries on a chip-multiprocessor

ICS '98 Proceedings of the 12th international conference on Supercomputing
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques

IEEE Transactions on Computers
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Architecture of the Atlas Chip-Multiprocessor: Dynamically Parallelizing Irregular Applications

IEEE Transactions on Computers
Parallel and Distribution Simulation Systems

Parallel and Distribution Simulation Systems
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Wisconsin Wind Tunnel II: A Fast, Portable Parallel Architecture Simulator

IEEE Concurrency
A Single-Chip Multiprocessor

Computer
Performance Simulation of an Alpha Microprocessor

Computer
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
A Case for NOW (Networks of Workstations)

IEEE Micro
Improving the Accuracy vs. Speed Tradeoff for Simulating Shared-Memory Multiprocessors with ILP Processors

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
The Alpha 21264 Microprocessor Architecture

ICCD '98 Proceedings of the International Conference on Computer Design
Mint Tutorial and User Manual

Mint Tutorial and User Manual

Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations

IEEE Transactions on Computers
Full-system chip multiprocessor power evaluations using FPGA-based emulation

Proceedings of the 13th international symposium on Low power electronics and design
ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
MPTLsim: a cycle-accurate, full-system simulator for x86-64 multicore architectures with coherent caches

ACM SIGARCH Computer Architecture News
SlackSim: a platform for parallel simulations of CMPs on CMPs

ACM SIGARCH Computer Architecture News
SlackSim: a platform for parallel simulations of CMPs on CMPs

ACM SIGMETRICS Performance Evaluation Review
Parallel simulation of chip-multiprocessor by using multi-threading

AsiaMS '07 Proceedings of the IASTED Asian Conference on Modelling and Simulation
Two-phase trace-driven simulation (TPTS): a fast multicore processor architecture simulation approach

Software—Practice & Experience
FastFwd: an efficient hardware acceleration technique for trace-driven network-on-chip simulation

CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Multiprocessor on chip: beating the simulation wall through multiobjective design space exploration with direct execution

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Adaptive and Speculative Slack Simulations of CMPs on CMPs

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Accelerating UNISIM-Based Cycle-Level Microarchitectural Simulations on Multicore Platforms

ACM Transactions on Design Automation of Electronic Systems (TODAES)
CRAW/P: a workload partition method for the efficient parallel simulation of manycores

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
A survey on cache tuning from a power/energy perspective

ACM Computing Surveys (CSUR)
ZSim: fast and accurate microarchitectural simulation of thousand-core systems

Proceedings of the 40th Annual International Symposium on Computer Architecture
Optimizing parallel simulation of multicore systems using domain-specific knowledge

Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
The COMPLEX reference framework for HW/SW co-design and power management supporting platform-based design-space exploration

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Chip-multiprocessor (CMP) architectures present a challenge for efficient simulation, combining the requirements of a detailed microprocessor simulator with that of a tightly-coupled parallel system. In this paper, a distributed simulator for target CMPs is presented based on the Message Passing Interface (MPI) designed to run on a host cluster of workstations. Microbenchmark-based evaluation is used to narrow the parallelization design space concerning the performance impact of distributed vs. centralized target L2 simulation, blocking vs. non-blocking remote cache accesses, null-message vs. barrier techniques for clock synchronization, and network interconnect selection. The best combination is shown to yield speedups of up to 16 on a 9-node cluster of dual-CPU workstations, partially due to cache effects.