Statistics for parallelism and abstraction level in digital simulation
DAC '87 Proceedings of the 24th ACM/IEEE Design Automation Conference
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Computer Architecture and Parallel Processing
Computer Architecture and Parallel Processing
Unified Methods for VLSI Simulation and Test Generation
Unified Methods for VLSI Simulation and Test Generation
Interconnection Networks for Parallel and Distributed Processing
Interconnection Networks for Parallel and Distributed Processing
A Distributed Prolog System with And Parallelism
IEEE Software
Predicting the Running Times of Parallel Programs by Simulation
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
A simulator for adaptive parallel applications
Journal of Computer and System Sciences
A simulator for parallel applications with dynamically varying compute node allocation
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
As more processors are added to a parallel processing system, the speedup gain diminishes. This behavior has been observed for several synchronized iterative algorithms. In this paper, we present a new statistical model of parallel processing for evaluating the performance of such algorithms on multiprocessor systems. A task is characterized by two parameters: the number of atoms and activity. An atom is the smallest part of computation that cannot be distributed to multiple processors and all atoms of a task are assumed to be equal in computational effort. Furthermore, atoms of the task become active with a fixed probability a called the activity. The task is equally divided among processors and the computation is synchronized at periodic instances when the results can be shared. The amount of computational activity of a processor within the period between synchronizations is assumed to be a binomial random variable. The performance of the multiprocessor system is derived from the maximum order-statistic of these random variables. The theoretical performance predicted by our analysis agrees well with the reported experimental performance of logic simulation of production VLSI chips. Even when the inter-processor communication overhead is neglected, the analysis explains many observed phenomena. The speedup continues to increase with the number of processors and assumes values around a x p, where p is the number of processors. As p is increased, the speedup rapidly changes from p to a x p. When the atoms can be equally divided among the p processors, the lower bound on speedup is found to be a x p. For unequal division of atoms to processors, however, the lower bound on speedup is less than a x p. Interestingly, for very low activity, speedups significantly higher than the lower bounds are possible.