A General Predictive Performance Model for Wavefront Algorithms on Clusters of SMPs

Authors:
Adolfy Hoisie;Olaf Lubeck;Harvey Wasserman;Fabrizio Petrini;Hank Alme
Affiliations:
-;-;-;-;-
Venue:
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Year:
2000

Citing 4
Cited 19

Parallel solution of triangular systems on distributed-memory multiprocessors

SIAM Journal on Scientific and Statistical Computing
Developments and trends in the parallel solution of linear systems

Parallel Computing - Special Anniversary issue
Performance Analysis of Wavefront Algorithms on Very-Large Scale Distributed Systems

Workshop on Wide Area Networks and High Performance Computing
Scalability Analysis of Multidimensional Wavefront Algorithms on Large-Scale SMP Clusters

FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation

Dynamic statistical profiling of communication activity in distributed applications

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
An empirical performance evaluation of scalable scientific applications

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Scalable analysis techniques for microprocessor performance counter metrics

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
STORM: lightning-fast resource management

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Communication characteristics of large-scale scientific applications for contemporary cluster architectures

Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Performance modeling of deterministic transport computations

Performance analysis and grid computing
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
BCS-MPI: A New Approach in the System Software Design for Large-Scale Parallel Computers

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Out-of-Core and Pipeline Techniques for Wavefront Algorithms

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
A General Performance Model of Structured and Unstructured Mesh Particle Transport Computations

The Journal of Supercomputing
System noise, OS clock ticks, and fine-grained parallel applications

Proceedings of the 19th annual international conference on Supercomputing
Parallel Programmer Productivity: A Case Study of Novice Parallel Programmers

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
SFT: scalable fault tolerance

ACM SIGOPS Operating Systems Review
High performance MPI design using unreliable datagram for ultra-scale InfiniBand clusters

Proceedings of the 21st annual international conference on Supercomputing
Integrated parallel performance views

Cluster Computing
Instruction-level simulation of a cluster at scale

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark

ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose and validate a closed-end, analytical, general, predictive performance model for applications based on wavefront algorithms on clusters of SMPs. Wavefront algorithms are ubiquitous in parallel computing, since they represent a means of enabling parallelism in computations that contain recurrences. Our particular interest in wavefront algorithms derives from their use in discrete ordinates neutral particle transport computations representative of ASCI, but other important uses are well known The proposed model captures the tradeoff between processor utilization and communication requirements characteristics of wavefront algorithms. The general model can predict the performance of this class of applications on distributed architectures with a network of lower dimensionality compared to that of an MPP, of which clusters of SMPs are one example. We validate the model using a compact-application from the ASCI workload on a large-scale cluster of SGI Origin 2000s in existence at the Los Alamos National Laboratory. The proposed model validates well on all clusters configurations utilized.