Performance estimation in a massively parallel system

  • Authors:
  • Vishwani D. Agrawal;Srimat T. Chakradhar

  • Affiliations:
  • AT&T Bell Laboratories, Murray Hill, NJ;NEC Research Institute, Princeton, NJ and Department of Computer Science, Rutgers University, New Brunswick, NJ

  • Venue:
  • Proceedings of the 1990 ACM/IEEE conference on Supercomputing
  • Year:
  • 1990

Quantified Score

Hi-index 0.00

Visualization

Abstract

As more processors are added to a parallel processing system, the speedup gain diminishes. This behavior has been observed for several synchronized iterative algorithms. In this paper, we present a new statistical model of parallel processing for evaluating the performance of such algorithms on multiprocessor systems. A task is characterized by two parameters: the number of atoms and activity. An atom is the smallest part of computation that cannot be distributed to multiple processors and all atoms of a task are assumed to be equal in computational effort. Furthermore, atoms of the task become active with a fixed probability a called the activity. The task is equally divided among processors and the computation is synchronized at periodic instances when the results can be shared. The amount of computational activity of a processor within the period between synchronizations is assumed to be a binomial random variable. The performance of the multiprocessor system is derived from the maximum order-statistic of these random variables. The theoretical performance predicted by our analysis agrees well with the reported experimental performance of logic simulation of production VLSI chips. Even when the inter-processor communication overhead is neglected, the analysis explains many observed phenomena. The speedup continues to increase with the number of processors and assumes values around a x p, where p is the number of processors. As p is increased, the speedup rapidly changes from p to a x p. When the atoms can be equally divided among the p processors, the lower bound on speedup is found to be a x p. For unequal division of atoms to processors, however, the lower bound on speedup is less than a x p. Interestingly, for very low activity, speedups significantly higher than the lower bounds are possible.