Performance estimation in a massively parallel system

Authors:
Vishwani D. Agrawal;Srimat T. Chakradhar
Affiliations:
AT&T Bell Laboratories, Murray Hill, NJ;NEC Research Institute, Princeton, NJ and Department of Computer Science, Rutgers University, New Brunswick, NJ
Venue:
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Year:
1990

Citing 7
Cited 3

Statistics for parallelism and abstraction level in digital simulation

DAC '87 Proceedings of the 24th ACM/IEEE Design Automation Conference
System and Application Software for the Armstrong Multiprocessor

Computer
Probability and Statistics with Reliability, Queuing and Computer Science Applications

Probability and Statistics with Reliability, Queuing and Computer Science Applications
Computer Architecture and Parallel Processing

Computer Architecture and Parallel Processing
Unified Methods for VLSI Simulation and Test Generation

Unified Methods for VLSI Simulation and Test Generation
Interconnection Networks for Parallel and Distributed Processing

Interconnection Networks for Parallel and Distributed Processing
A Distributed Prolog System with And Parallelism

IEEE Software

Predicting the Running Times of Parallel Programs by Simulation

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
A simulator for adaptive parallel applications

Journal of Computer and System Sciences
A simulator for parallel applications with dynamically varying compute node allocation

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

As more processors are added to a parallel processing system, the speedup gain diminishes. This behavior has been observed for several synchronized iterative algorithms. In this paper, we present a new statistical model of parallel processing for evaluating the performance of such algorithms on multiprocessor systems. A task is characterized by two parameters: the number of atoms and activity. An atom is the smallest part of computation that cannot be distributed to multiple processors and all atoms of a task are assumed to be equal in computational effort. Furthermore, atoms of the task become active with a fixed probability a called the activity. The task is equally divided among processors and the computation is synchronized at periodic instances when the results can be shared. The amount of computational activity of a processor within the period between synchronizations is assumed to be a binomial random variable. The performance of the multiprocessor system is derived from the maximum order-statistic of these random variables. The theoretical performance predicted by our analysis agrees well with the reported experimental performance of logic simulation of production VLSI chips. Even when the inter-processor communication overhead is neglected, the analysis explains many observed phenomena. The speedup continues to increase with the number of processors and assumes values around a x p, where p is the number of processors. As p is increased, the speedup rapidly changes from p to a x p. When the atoms can be equally divided among the p processors, the lower bound on speedup is found to be a x p. For unequal division of atoms to processors, however, the lower bound on speedup is less than a x p. Interestingly, for very low activity, speedups significantly higher than the lower bounds are possible.