Communication optimizations for parallel C programs

Authors:
Yingchun Zhu;Laurie J. Hendren
Affiliations:
School of Computer Science, McGill University, Montreal, Quebec, Canada H3A 2A7;School of Computer Science, McGill University, Montreal, Quebec, Canada H3A 2A7
Venue:
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Year:
1998

Citing 20
Cited 14

Communication optimization and code generation for distributed memory machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
GIVE-N-TAKE—a balanced code placement framework

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Context-sensitive interprocedural points-to analysis in the presence of function pointers

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Optimal code motion: theory and practice

ACM Transactions on Programming Languages and Systems (TOPLAS)
Supporting dynamic data structures on distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimizing parallel programs with explicit synchronization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Interprocedural partial redundancy elimination and its application to distributed memory compilation

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Software caching and computation migration in Olden

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Global communication analysis and optimization

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Polling watchdog: combining polling and interrupts for efficient message handling

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Compiler-based prefetching for recursive data structures

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Olden: parallelizing programs with dynamic data structures on distributed-memory machines

Olden: parallelizing programs with dynamic data structures on distributed-memory machines
A study of the EARTH-MANNA multithreaded system

International Journal of Parallel Programming - Special issue on parallel architectures and compilation techniques—part II
Synchronization transformations for parallel computing

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Compiling C for the EARTH multithreaded architecture

International Journal of Parallel Programming - Special issue: selected papers from PACT'96, fourth international conference on parallel architectures and compilation techniques—part 1
Putting pointer analysis to work

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Latency Hiding in Message-Passing Architectures

Proceedings of the 8th International Symposium on Parallel Processing
Locality Analysis For Parallel C Programs

PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Heap Analysis And Optimizations For Threaded Programs

PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Putting pointer analysis to work

Putting pointer analysis to work

Locality Analysis for Parallel C Programs

IEEE Transactions on Parallel and Distributed Systems
Pointer analysis for multithreaded programs

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Automatic compiler techniques for thread coarsening for multithreaded architectures

Proceedings of the 14th international conference on Supercomputing
Pointer analysis for structured parallel programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler and Runtime Support for Irregular Reductions on a Multithreaded Architecture

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Analysis of Multithreaded Programs

SAS '01 Proceedings of the 8th International Symposium on Static Analysis
Pointer analysis of multithreaded Java programs

Proceedings of the 2003 ACM symposium on Applied computing
Communication Optimizations for Fine-Grained UPC Applications

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Making Sequential Consistency Practical in Titanium

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Shared memory programming for large scale machines

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Automatic nonblocking communication for partitioned global address space programs

Proceedings of the 21st annual international conference on Supercomputing
Type systems for distributed data sharing

SAS'03 Proceedings of the 10th international conference on Static analysis
Automatically generating symbolic prefetches for distributed transactional memories

Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware
Hierarchical pointer analysis for distributed programs

SAS'07 Proceedings of the 14th international conference on Static Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents algorithms for reducing the communication overhead for parallel C programs that use dynamically-allocated data structures. The framework consists of an analysis phase called possible-placement analysis, and a transformation phase called communication selection.The fundamental idea of possible-placement analysis is to find all possible points for insertion of remote memory operations. Remote reads are propagated upwards, whereas remote writes are propagated downwards. Based on the results of the possible-placement analysis, the communication selection transformation selects the "best" place for inserting the communication, and determines if pipelining or blocking of communication should be performed.The framework has been implemented in the EARTH-McCAT optimizing/parallelizing C compiler, and experimental results are presented for five pointer-intensive benchmarks running on the EARTH-MANNA distributed-memory parallel architecture. These experiments show that the communication optimization can provide performance improvements of up to 16% over the unoptimized benchmarks.