Proceedings of the 14th international conference on Supercomputing
Time-Sharing Parallel Jobs in the Presence of Multiple Resource Requirements
IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
A General Predictive Performance Model for Wavefront Algorithms on Clusters of SMPs
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Out-of-Core and Pipeline Techniques for Wavefront Algorithms
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Adaptive Parallel Job Scheduling with Flexible Coscheduling
IEEE Transactions on Parallel and Distributed Systems
How Well Can Simple Metrics Represent the Performance of HPC Applications?
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A performance prediction framework for scientific applications
Future Generation Computer Systems
The Design and Implementation of a Domain-Specific Language for Network Performance Testing
IEEE Transactions on Parallel and Distributed Systems
A genetic algorithms approach to modeling the performance of memory-bound computations
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Scientific Programming - High Performance Computing with the Cell Broadband Engine
A performance prediction framework for scientific applications
Future Generation Computer Systems
STAPL: an adaptive, generic parallel C++ library
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
The reverse-acceleration model for programming petascale hybrid systems
IBM Journal of Research and Development
GPU accelerated simulations of 3D deterministic particle transport using discrete ordinates method
Journal of Computational Physics
Optimizing sweep3d for graphic processor unit
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Hi-index | 0.00 |
We develop a model for the parallel perform-ance of algorithms that consist of concurrent, two-dimensional wavefronts implemented in a message pass-ing environment. The model combines the separate con-tributions of computation and communication wavefronts. We validate the model on three supercomputer systems, with up to 500 processors, using data from an ASCI de-terministic particle transport application, although the model is general to any wavefront algorithm implemented on a 2-D processor domain. We also use the model to make estimates of performance and scalability of wave-front algorithms on 100-TFLOPS computer systems ex-pected to be in existence within the next decade. Our model shows that on a 1-billion-cell problem, single-node computation speed (not inter-processor communication performance, as is widely believed) is the bottleneck. Fi-nally, we present preliminary considerations that reveal the additional complexity associated with modeling wavefront algorithms on reduced-connectivity network topologies, such as clusters of SMPs.