Efficient type inference for higher-order binding-time analysis
Proceedings of the 5th ACM conference on Functional programming languages and computer architecture
Supporting dynamic data structures on distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Commutativity analysis: a new analysis framework for parallelizing compilers
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Polling watchdog: combining polling and interrupts for efficient message handling
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Points-to analysis in almost linear time
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Olden: parallelizing programs with dynamic data structures on distributed-memory machines
Olden: parallelizing programs with dynamic data structures on distributed-memory machines
A study of the EARTH-MANNA multithreaded system
International Journal of Parallel Programming - Special issue on parallel architectures and compilation techniques—part II
Communication optimizations for parallel C programs
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Latency Hiding in Message-Passing Architectures
Proceedings of the 8th International Symposium on Parallel Processing
Designing the McCAT Compiler Based on a Family of Structured Intermediate Representations
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Extended SSA Numbering: Introducing SSA Properties to Language with Multi-level Pointers
CC '98 Proceedings of the 7th International Conference on Compiler Construction
Compiling C for the EARTH Multithreaded Architecture
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
A Framework to Capture Dynamic Data Structures in Pointer-Based Codes
IEEE Transactions on Parallel and Distributed Systems
On the parallelization of irregular and dynamic programs
Parallel Computing
Static Detection of Place Locality and Elimination of Runtime Checks
APLAS '08 Proceedings of the 6th Asian Symposium on Programming Languages and Systems
Type systems for distributed data sharing
SAS'03 Proceedings of the 10th international conference on Static analysis
Hi-index | 0.00 |
Many parallel architectures support a memory model where some memory accesses are local and, thus, inexpensive, while other memory accesses are remote and potentially quite expensive. In the case of memory references via pointers, it is often difficult to determine if the memory reference is guaranteed to be local and, thus, can be handled via an inexpensive memory operation. Determining which memory accesses are local can be done by the programmer, the compiler, or a combination of both. The overall goal is to minimize the work required by the programmer and have the compiler automate the process as much as possible. This paper reports on compiler techniques for determining when indirect memory references are local. The locality analysis has been implemented for a parallel dialect of C called EARTH-C, and it uses an algorithm inspired by type inference algorithms for fast points-to analysis. The algorithm statically estimates when an indirect reference via a pointer can be safely assumed to be a local access. The locality inference algorithm is also used to guide the automatic specialization of functions in order to take advantage of locality specific to particular calling contexts. In addition to these purely static techniques, we also suggest fine-grain and coarse-grain dynamic techniques. In this case, dynamic locality checks are inserted into the program and specialized code for the local case is inserted. In the fine-grain case, the checks are put around single memory references, while in the coarse-grain case the checks are put around larger program segments. The static locality analysis and automatic specialization has been implemented in the EARTH-C compiler, which produces low-level threaded code for the EARTH multithreaded architecture. Experimental results are presented for a set of benchmarks that operate on irregular, dynamically allocated data structures. Overall, the techniques give moderate to significant speedups, with the combination of static and dynamic techniques giving the best performance overall.