Parallel and distributed computation: numerical methods
Parallel and distributed computation: numerical methods
A bridging model for parallel computation
Communications of the ACM
A framework for benchmark performance analysis
Computer benchmarks
Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms
IBM Journal of Research and Development
Using MPI: portable parallel programming with the message-passing interface
Using MPI: portable parallel programming with the message-passing interface
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
Public international benchmarks for parallel computers: PARKBENCH committee: Report-1
Scientific Programming
IBM Systems Journal
The SP2 high-performance switch
IBM Systems Journal
Early prediction of MPP performance: the SP2, T3D, and Paragon experiences
Parallel Computing
Studies in Computational Science: Parallel Programming Paradigms
Studies in Computational Science: Parallel Programming Paradigms
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Parallel Computers Two: Architecture, Programming and Algorithms
Parallel Computers Two: Architecture, Programming and Algorithms
Modeling Communication Overhead: MPI and MPL Performance on the IBM SP2
IEEE Parallel & Distributed Technology: Systems & Technology
ICS '97 Proceedings of the 11th international conference on Supercomputing
Resource Scaling Effects on MPP Performance: The STAP Benchmark Implications
IEEE Transactions on Parallel and Distributed Systems
Design and Performance Evaluation of a Portable Parallel Library for Space-Time Adaptive Processing
IEEE Transactions on Parallel and Distributed Systems
A Transformation Approach to Derive Efficient Parallel Implementations
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools parallel processing
Scatter and gather operations on an asynchronous communication model
SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
Deriving Array Distributions by Optimization Techniques
The Journal of Supercomputing
Link contention-constrained scheduling and mapping of tasks
Cluster Computing
Modeling Communication Overhead: MPI and MPL Performance on the IBM SP2
IEEE Parallel & Distributed Technology: Systems & Technology
CASCH: A Tool for Computer-Aided Scheduling
IEEE Concurrency
High-Speed Image reconstruction based on CBP and Fourier Inversion Methods
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Performance Analysis of a Myrinet-Based Cluster
Cluster Computing
A pipeline technique for dynamic data transfer on a multiprocessor grid
International Journal of Parallel Programming
A semi-static approach to mapping dynamic iterative tasks onto heterogeneous computing systems
Journal of Parallel and Distributed Computing
Extending Amdahl's law and Gustafson's law by evaluating interconnections on multi-core processors
The Journal of Supercomputing
Hi-index | 0.00 |
This paper evaluates the IBM SP2 architecture, the AIX parallel programming environment, and the IBM message-passing library (MPL) through STAP (Space-Time Adaptive Processing) benchmark experiments. Only coarse-grain parallelism was exploited on the SP2 due to its high communication overhead. A new parallelization scheme is developed for programming message passing multicomputers. Parallel STAP benchmark structures are illustrated with domain decomposition, efficient mapping of partitioned programs, and optimization of collective communication operations. We measure the SP2 performance in terms of execution time, Gflop/s rate, speedup over a single SP2 node, and overall system utilization. With 256 nodes, the Maui SP2 demonstrated the best performance of 23 Gflop/s in executing the High-Order Post-Doppler program, corresponding to a 34% system utilization. We have conducted a scalability analysis to reveal the performance growth rate as a function of machine size and STAP problem size. Important lessons learned from these parallel processing benchmark experiments are discussed in the context of real-time, adaptive, radar signal processing on massively parallel processors (MPP).