Landing CG on EARTH: a case study of fine-grained multithreading on an evolutionary path

  • Authors:
  • Kevin B. Theobald;Gagan Agrawal;Rishi Kumar;Gerd Heber;Guang R. Gao;Paul Stodghill;Keshav Pingali

  • Affiliations:
  • Department of Electrical and Computer Engineering, University of Delaware;Department of Computer and Information Sciences, University of Delaware;Department of Electrical and Computer Engineering, University of Delaware;Cornell Theory Center, Cornell University;Department of Electrical and Computer Engineering, University of Delaware;Department of Computer Science, Cornell University;Department of Computer Science, Cornell University

  • Venue:
  • Proceedings of the 2000 ACM/IEEE conference on Supercomputing
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

We report on our work in developing a fine-grained multithreaded solution for the communication-intensive Conjugate Gradient (CG) problem. In our recent work, we have developed a simple, yet very efficient, solution to executing matrix-vector multiply on a multithreaded system. This paper presents an effective mechanism for the reduction-broadcast phase, which is implemented and integrated with the sparse MVM resulting in a scalable implementation of the complete CG application. Three major observations from our experiments on the EARTH multithreaded testbed are: (1) The scalability of our CG implementation is impressive, e.g., speedup is 90 on 120 processors for the NAS CG class B input. (2) Our dataflow-style reduction-broadcast network based on fine-grain multithreading is twice as fast as a serial reduction scheme on the same system. (3)By slowing down the netwok by a factor of 2, no notable degradation of overall CG performance was observed.