Automatic compiler techniques for thread coarsening for multithreaded architectures

Authors:
Gary M. Zoppetti;Gagan Agrawal;Lori Pollock;Jose Nelson Amaral;Xinan Tang;Guang Gao
Affiliations:
Department of Computer and Information Sciences, University of Delaware, Newark DE;Department of Computer and Information Sciences, University of Delaware, Newark DE;Department of Computer and Information Sciences, University of Delaware, Newark DE;Department of Electrical and Computer Engineering, University of Delaware, Newark DE;Chameleon Systems Inc., Sunnyvale, CA and Department of Electrical and Computer Engineering, University of Delaware, Newark DE;Department of Electrical and Computer Engineering, University of Delaware, Newark DE
Venue:
Proceedings of the 14th international conference on Supercomputing
Year:
2000

Citing 27
Cited 2

A safe approximate algorithm for interprocedural aliasing

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
TAM—a compiler controlled threaded abstract machine

Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
Context-sensitive interprocedural points-to analysis in the presence of function pointers

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Supporting dynamic data structures on distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficient context-sensitive pointer analysis for C programs

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
A design study of the EARTH multiprocessor

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in C

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Dynamic pointer alignment: tiling and communication optimizations for parallel pointer-based computations

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Tuning compiler optimizations for simultaneous multithreading

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Compiling C for the EARTH multithreaded architecture

International Journal of Parallel Programming - Special issue: selected papers from PACT'96, fourth international conference on parallel architectures and compilation techniques—part 1
Communication optimizations for parallel C programs

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Pointer analysis for multithreaded programs

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Pointer analysis for programs with structures and casting

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
APRIL: a processor architecture for multiprocessing

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Automatically partitioning threads for multithreaded architectures

Journal of Parallel and Distributed Computing - Special issue on compilation and architectural support for parallel applications
Latency Hiding in Message-Passing Architectures

Proceedings of the 8th International Symposium on Parallel Processing
An Evaluation of Optimized Threaded Code Generation

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
EM-C: Programming with Explicit Parallelism and Locality for EM-4 Multiprocessor

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Detecting Parallelism in C Programs with Recursive Darta Structures

CC '98 Proceedings of the 7th International Conference on Compiler Construction
Performance Study of a Concurrent Multithreaded Processor

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Locality Analysis For Parallel C Programs

PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Heap Analysis And Optimizations For Threaded Programs

PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Compiling C for the EARTH Multithreaded Architecture

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Compiling for multithreaded architectures

Compiling for multithreaded architectures

Compiler and Runtime Support for Irregular Reductions on a Multithreaded Architecture

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Supporting microthread scheduling and synchronisation in CMPs

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multithreaded architectures are emerging as an important class of parallel machines. By allowing fast context switching between threads on the same processor, these systems hide communication and synchronization latencies and allow scalable parallelism for dynamic and irregular applications. Thread partitioning is the most important task in compiling high-level languages for multithreaded architectures. Non-preemptive multithreaded architectures, which can be built from off-the-shelf components, require that if a thread issues a potentially remote memory request, then any statement that is dependent upon this request must be in a separate thread.When performing thread partitioning on codes that use pointer-based recursive data structures, it is often difficult to extract accurate dependence information. As a result, threads of unnecessarily small granularity get generated, which, because of thread switching costs, leads to increased execution time. In this paper, we present three techniques that lead to improved extraction and representation of dependence information in the presence of structured control flow, references through fields of structures, and pointer-based data structures. The benefit of these techniques is the generation of coarser-grained threads and, therefore, decreased execution time. Our experiments were performed using the EARTH-C compiler and the EARTH multithreaded architecture model emulated on both a cluster of Pentium PCs and a distributed memory multiprocessor. On our set of 6 pointer-based programs, these techniques reduced the static number of threads by 38%. Reductions in execution times ranged from 16% to 45% on the four programs we measured runtime performance.