Parallelization of IBM mambo system simulator in functional modes

Authors:
Kun Wang;Yu Zhang;Huayong Wang;Xiaowei Shen
Affiliations:
IBM China Research Lab;IBM China Research Lab;IBM China Research Lab;IBM China Research Lab
Venue:
ACM SIGOPS Operating Systems Review
Year:
2008

Citing 13
Cited 8

The PowerPC architecture: a specification for a new family of RISC processors

The PowerPC architecture: a specification for a new family of RISC processors
Embra: fast and flexible machine simulation

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
DAISY: dynamic compilation for 100% architectural compatibility

Proceedings of the 24th annual international symposium on Computer architecture
Complete Computer System Simulation: The SimOS Approach

IEEE Parallel & Distributed Technology: Systems & Technology
Simics: A Full System Simulation Platform

Computer
An overview of the BlueGene/L Supercomputer

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Design and validation of a performance and power simulator for PowerPC systems

IBM Journal of Research and Development
Mambo: a full system simulator for the PowerPC architecture

ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
The BlueGene/L pseudo cycle-accurate simulator

ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
A practical FPGA-based framework for novel CMP research

Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
QEMU, a fast and portable dynamic translator

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
ATLAS: a chip-multiprocessor with transactional memory support

Proceedings of the conference on Design, automation and test in Europe
FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture

ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Performance of large low-associativity caches

ACM SIGMETRICS Performance Evaluation Review
COREMU: a scalable and portable parallel full-system emulator

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
P-GAS: Parallelizing a Cycle-Accurate Event-Driven Many-Core Processor Simulator Using Parallel Discrete Event Simulation

PADS '10 Proceedings of the 2010 IEEE Workshop on Principles of Advanced and Distributed Simulation
MCEmu: A Framework for Software Development and Performance Analysis of Multicore Systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
A real-time, energy-efficient system software suite for heterogeneous multicore platforms

Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
CRAW/P: a workload partition method for the efficient parallel simulation of manycores

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mambo [4] is IBM's full-system simulator which models PowerPC systems, and provides a complete set of simulation tools to help IBM and its partners in pre-hardware development and performance evaluation for future systems. Currently Mambo simulates target systems on a single host thread. When the number of cores increases in a target system, Mambo's simulation performance for each core goes down. As the so-called "multi-core era" approaches, both target and host systems will have more and more cores. It is very important for Mambo to efficiently simulate a multi-core target system on a multi-core host system. Parallelization is a natural method to speed up Mambo under this situation. Parallel Mambo (P-Mambo) is a multi-threaded implementation of Mambo. Mambo's simulation engine is implemented as a user-level thread-scheduler. We propose a multi-scheduler method to adapt Mambo's simulation engine to multi-threaded execution. Based on this method a core-based module partition is proposed to achieve both high inter-scheduler parallelism and low inter-scheduler dependency. Protection of shared resources is crucial to both correctness and performance of P-Mambo. Since there are two tiers of threads in P-Mambo, protecting shared resources by only OS-level locks possibly introduces deadlocks due to user-level context switch. We propose a new lock mechanism to handle this problem. Since Mambo is an on-going project with many modules currently under development, co-existence with new modules is also important to P-Mambo. We propose a global-lock-based method to guarantee compatibility of P-Mambo with future Mambo modules. We have implemented the first version of P-Mambo in functional modes. The performance of P-Mambo has been evaluated on the OpenMP implementation of NAS Parallel Benchmark (NPB) 3.2 [12]. Preliminary experimental results show that P-Mambo achieves an average speedup of 3.4 on a 4-core host machine.