Parallel solution of triangular systems on distributed-memory multiprocessors
SIAM Journal on Scientific and Statistical Computing
Developments and trends in the parallel solution of linear systems
Parallel Computing - Special Anniversary issue
Performance Analysis of Wavefront Algorithms on Very-Large Scale Distributed Systems
Workshop on Wide Area Networks and High Performance Computing
Scalability Analysis of Multidimensional Wavefront Algorithms on Large-Scale SMP Clusters
FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
Dynamic statistical profiling of communication activity in distributed applications
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
An empirical performance evaluation of scalable scientific applications
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Scalable analysis techniques for microprocessor performance counter metrics
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
STORM: lightning-fast resource management
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Performance modeling of deterministic transport computations
Performance analysis and grid computing
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
BCS-MPI: A New Approach in the System Software Design for Large-Scale Parallel Computers
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Out-of-Core and Pipeline Techniques for Wavefront Algorithms
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
A General Performance Model of Structured and Unstructured Mesh Particle Transport Computations
The Journal of Supercomputing
System noise, OS clock ticks, and fine-grained parallel applications
Proceedings of the 19th annual international conference on Supercomputing
Parallel Programmer Productivity: A Case Study of Novice Parallel Programmers
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
ACM SIGOPS Operating Systems Review
High performance MPI design using unreliable datagram for ultra-scale InfiniBand clusters
Proceedings of the 21st annual international conference on Supercomputing
Integrated parallel performance views
Cluster Computing
Instruction-level simulation of a cluster at scale
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Hi-index | 0.00 |
We propose and validate a closed-end, analytical, general, predictive performance model for applications based on wavefront algorithms on clusters of SMPs. Wavefront algorithms are ubiquitous in parallel computing, since they represent a means of enabling parallelism in computations that contain recurrences. Our particular interest in wavefront algorithms derives from their use in discrete ordinates neutral particle transport computations representative of ASCI, but other important uses are well known The proposed model captures the tradeoff between processor utilization and communication requirements characteristics of wavefront algorithms. The general model can predict the performance of this class of applications on distributed architectures with a network of lower dimensionality compared to that of an MPP, of which clusters of SMPs are one example. We validate the model using a compact-application from the ASCI workload on a large-scale cluster of SGI Origin 2000s in existence at the Los Alamos National Laboratory. The proposed model validates well on all clusters configurations utilized.