Automatic compiler techniques for thread coarsening for multithreaded architectures

  • Authors:
  • Gary M. Zoppetti;Gagan Agrawal;Lori Pollock;Jose Nelson Amaral;Xinan Tang;Guang Gao

  • Affiliations:
  • Department of Computer and Information Sciences, University of Delaware, Newark DE;Department of Computer and Information Sciences, University of Delaware, Newark DE;Department of Computer and Information Sciences, University of Delaware, Newark DE;Department of Electrical and Computer Engineering, University of Delaware, Newark DE;Chameleon Systems Inc., Sunnyvale, CA and Department of Electrical and Computer Engineering, University of Delaware, Newark DE;Department of Electrical and Computer Engineering, University of Delaware, Newark DE

  • Venue:
  • Proceedings of the 14th international conference on Supercomputing
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multithreaded architectures are emerging as an important class of parallel machines. By allowing fast context switching between threads on the same processor, these systems hide communication and synchronization latencies and allow scalable parallelism for dynamic and irregular applications. Thread partitioning is the most important task in compiling high-level languages for multithreaded architectures. Non-preemptive multithreaded architectures, which can be built from off-the-shelf components, require that if a thread issues a potentially remote memory request, then any statement that is dependent upon this request must be in a separate thread.When performing thread partitioning on codes that use pointer-based recursive data structures, it is often difficult to extract accurate dependence information. As a result, threads of unnecessarily small granularity get generated, which, because of thread switching costs, leads to increased execution time. In this paper, we present three techniques that lead to improved extraction and representation of dependence information in the presence of structured control flow, references through fields of structures, and pointer-based data structures. The benefit of these techniques is the generation of coarser-grained threads and, therefore, decreased execution time. Our experiments were performed using the EARTH-C compiler and the EARTH multithreaded architecture model emulated on both a cluster of Pentium PCs and a distributed memory multiprocessor. On our set of 6 pointer-based programs, these techniques reduced the static number of threads by 38%. Reductions in execution times ranged from 16% to 45% on the four programs we measured runtime performance.