Communication optimization and code generation for distributed memory machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
GIVE-N-TAKE—a balanced code placement framework
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Context-sensitive interprocedural points-to analysis in the presence of function pointers
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Optimal code motion: theory and practice
ACM Transactions on Programming Languages and Systems (TOPLAS)
Supporting dynamic data structures on distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimizing parallel programs with explicit synchronization
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Interprocedural partial redundancy elimination and its application to distributed memory compilation
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Software caching and computation migration in Olden
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Global communication analysis and optimization
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Polling watchdog: combining polling and interrupts for efficient message handling
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Olden: parallelizing programs with dynamic data structures on distributed-memory machines
Olden: parallelizing programs with dynamic data structures on distributed-memory machines
A study of the EARTH-MANNA multithreaded system
International Journal of Parallel Programming - Special issue on parallel architectures and compilation techniques—part II
Synchronization transformations for parallel computing
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Compiling C for the EARTH multithreaded architecture
International Journal of Parallel Programming - Special issue: selected papers from PACT'96, fourth international conference on parallel architectures and compilation techniques—part 1
Putting pointer analysis to work
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Latency Hiding in Message-Passing Architectures
Proceedings of the 8th International Symposium on Parallel Processing
Locality Analysis For Parallel C Programs
PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Heap Analysis And Optimizations For Threaded Programs
PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Putting pointer analysis to work
Putting pointer analysis to work
Locality Analysis for Parallel C Programs
IEEE Transactions on Parallel and Distributed Systems
Pointer analysis for multithreaded programs
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Automatic compiler techniques for thread coarsening for multithreaded architectures
Proceedings of the 14th international conference on Supercomputing
Pointer analysis for structured parallel programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler and Runtime Support for Irregular Reductions on a Multithreaded Architecture
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Analysis of Multithreaded Programs
SAS '01 Proceedings of the 8th International Symposium on Static Analysis
Pointer analysis of multithreaded Java programs
Proceedings of the 2003 ACM symposium on Applied computing
Communication Optimizations for Fine-Grained UPC Applications
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Making Sequential Consistency Practical in Titanium
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Shared memory programming for large scale machines
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Automatic nonblocking communication for partitioned global address space programs
Proceedings of the 21st annual international conference on Supercomputing
Type systems for distributed data sharing
SAS'03 Proceedings of the 10th international conference on Static analysis
Automatically generating symbolic prefetches for distributed transactional memories
Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware
Hierarchical pointer analysis for distributed programs
SAS'07 Proceedings of the 14th international conference on Static Analysis
Hi-index | 0.00 |
This paper presents algorithms for reducing the communication overhead for parallel C programs that use dynamically-allocated data structures. The framework consists of an analysis phase called possible-placement analysis, and a transformation phase called communication selection.The fundamental idea of possible-placement analysis is to find all possible points for insertion of remote memory operations. Remote reads are propagated upwards, whereas remote writes are propagated downwards. Based on the results of the possible-placement analysis, the communication selection transformation selects the "best" place for inserting the communication, and determines if pipelining or blocking of communication should be performed.The framework has been implemented in the EARTH-McCAT optimizing/parallelizing C compiler, and experimental results are presented for five pointer-intensive benchmarks running on the EARTH-MANNA distributed-memory parallel architecture. These experiments show that the communication optimization can provide performance improvements of up to 16% over the unoptimized benchmarks.