Effects of synchronization barriers on multiprocessor performance
Parallel Computing
Processor Allocation for Horizontal and Vertical Parallelism and Related Speedup Bounds
IEEE Transactions on Computers
Stochastic Bounds on Execution Times of Parallel Programs
IEEE Transactions on Software Engineering
Performance prediction for a class of parallel computations
Performance prediction for a class of parallel computations
Performance of Synchronous Parallel Algorithms with Regular Structures
IEEE Transactions on Parallel and Distributed Systems
Analyzing Performance of Sequencing Mechanisms for Simple Layered Task Systems
IPPS '92 Proceedings of the 6th International Parallel Processing Symposium
Performance analysis for parallel solutions to generic search problems
SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
Estimating the execution time distribution for a task graph in a heterogeneous computing system
HCW '97 Proceedings of the 6th Heterogeneous Computing Workshop (HCW '97)
Performance modeling and analysis of correlated parallel computations
Parallel Computing
Performance under Failures of DAG-based Parallel Computing
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Hi-index | 0.00 |
This paper considers the class of parallel computations represented by directed, acyclic task graphs. These include parallel loops, multiphase algorithms, partitioning and merging algorithms, as well as any arbitrary parallel computation that can be structured by a task graph. The paper reviews the current state of the art in stochastic bound models of parallel programs and presents new stochastic bound performance models that predict the expected execution time of parallel programs on a given shared-memory multiprocessor system; and provide qualitative and quantitative description of the relationships between the structure of parallel programs, computation and synchronization behavior of the program, and architectural features of the underlying multiprocessor system.The models use a new formulation based on stochastic bound analysis and are solvable for a number of distribution functions. They are applicable to shared-memory multiprocessors with significantly different architectural and synchronization performance characteristics. The accuracy of the models is validated via several measurements on two different shared-memory multiprocessor systems, the Alliant FX/2800 and the Encore Multimax. The results show the models to be quite accurate, even when some of the modeling assumptions are violated. The maximum error of prediction ranges from about 10% to under 1%.