Towards massively parallel simulations of massively parallel high-performance computing systems

Authors:
Robert Birke;German Rodriguez;Cyriel Minkenberg
Affiliations:
IBM Research -- Zurich, Rüschlikon, Switzerland;IBM Research -- Zurich, Rüschlikon, Switzerland;IBM Research -- Zurich, Rüschlikon, Switzerland
Venue:
Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques
Year:
2012

Citing 7
Cited 0

k -ary n -trees: High Performance Networks for Massively Parallel Architectures

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Conservative synchronization of large-scale network simulations

Proceedings of the eighteenth workshop on Parallel and distributed simulation
Distributed Simulation: A Case Study in Design and Verification of Distributed Programs

IEEE Transactions on Software Engineering
A framework for end-to-end simulation of high-performance computing systems

Proceedings of the 1st international conference on Simulation tools and techniques for communications, networks and systems & workshops
Trace-driven co-simulation of high-performance computing systems using OMNeT++

Proceedings of the 2nd International Conference on Simulation Tools and Techniques
Overview of the Blue Gene/L system architecture

IBM Journal of Research and Development
Blue Gene/L torus interconnection network

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

The power of high-performance computing (HPC) is applied to simulate highly complex systems and processes in many scientific communities, e. g. in particle physics, weather and climate research, bio-sciences, materials science, pharmaceutics, astronomy, or finance. Current HPC systems are so complex that the design of such a system, including architecture design space exploration and performance prediction, requires HPC-like simulation capabilities. To this end, we developed an Omnest-based simulation environment that enables studying the impact of an HPC machine's communication subsystem on the overall system's performance for specific workloads. As the scale of current high-end HPC systems is in the range of hundreds of thousands of processing cores, full system simulation---at an abstraction level that still maintains a reasonably high level of detail---is infeasible without resorting to parallel simulation, the main limiting factors being simulation run time and memory footprint. We describe our experiences in adapting our simulation environment to take advantage of the parallel distributed simulation capabilities provided by Omnest. We present results obtained on a many-core SMP machine as well as a small-scale InfiniBand cluster. Furthermore, we ported our simulation environment, including Omnest itself, to the massively parallel IBM Blue Gene®/P platform. We report results from initial experiments on this platform using up to 512 cores in parallel.