Parallel processing: the Cm* experience
Parallel processing: the Cm* experience
Dependence flow graphs: an algebraic approach to program dependencies
POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proceedings of the 14th international conference on Supercomputing
Parallel algorithms for radiation transport on unstructured grids
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Predictive performance and scalability modeling of a large-scale application
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Exploring Advanced Architectures Using Performance Prediction
IWIA '02 Proceedings of the International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'02)
Parallel program performance prediction using deterministic task graph analysis
ACM Transactions on Computer Systems (TOCS)
Performance modeling of deterministic transport computations
Performance analysis and grid computing
Verifying large-scale system performance during installation using modelling
High performance scientific and engineering computing
International Journal of High Performance Computing Applications
A performance model of non-deterministic particle transport on large-scale systems
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
A Performance Model of the Parallel Ocean Program
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
In this work we present a predictive analytical model that encompasses the performance and scaling characteristics of a non-deterministic particle transport application, MCNP (Monte Carlo N-Particle), that represents part of the Advanced Simulation and Computing (ASC) workload. MCNP can be used for the simulation of neutron, photon, electron, or coupled transport, and has found uses in many problem areas including nuclear reactors, radiation shielding, and medical physics. Monte Carlo methods in general and MCNP specifically do not solve an explicit equation, but rather obtain answers by simulating the interactions between individual particles and a predefined geometry. This is in contrast to deterministic transport methods, the most common of which is the discrete ordinates method, that solve the transport equation directly for the average particle behavior. Previous studies on the scalability of parallel Monte Carlo calculations have been rather general in nature. The performance model developed here is both detailed and parametric with both application characteristics (e.g. problem size), and system characteristics (e.g. communication latency, bandwidth, achieved processing rate) serving as input. The model is validated against measurements on an AlphaServer ES40 system showing high accuracy across many processor/problem combinations. The model is then used to provide insight into the achievable performance that should be possible on systems containing thousands of processors and to quantify the impact that possible improvements in sub-system performance may have. In addition, the impact on performance of modifying the communication structure of the code is also quantified.