Validation of Dimemas Communication Model for MPI Collective Operations
Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Recovery Patterns for Iterative Methods in a Parallel Unstable Environment
SIAM Journal on Scientific Computing
Fault tolerant algorithms for heat transfer problems
Journal of Parallel and Distributed Computing
μπ: a scalable and transparent system for simulating MPI programs
Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
The structural simulation toolkit
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Simulation of Large-Scale HPC Architectures
ICPPW '11 Proceedings of the 2011 40th International Conference on Parallel Processing Workshops
Super-Scalable algorithms for computing on 100,000 processors
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part I
Hi-index | 0.00 |
As supercomputers scale to 1000 PFlop/s over the next decade, investigating the performance of parallel applications at scale on future architectures and the performance impact of different architecture choices for high-performance computing (HPC) hardware/software co-design is crucial. This paper summarizes recent efforts in designing and implementing a novel HPC hardware/software co-design toolkit. The presented Extreme-scale Simulator (xSim) permits running an HPC application in a controlled environment with millions of concurrent execution threads while observing its performance in a simulated extreme-scale HPC system using architectural models and virtual timing. This paper demonstrates the capabilities and usefulness of the xSim performance investigation toolkit, such as its scalability to 2^2^7 simulated Message Passing Interface (MPI) ranks on 960 real processor cores, the capability to evaluate the performance of different MPI collective communication algorithms, and the ability to evaluate the performance of a basic Monte Carlo application with different architectural parameters.