Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
The program dependence graph and its use in optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
Run-time scheduling and execution of loops on message passing machines
Journal of Parallel and Distributed Computing - Special issue: algorithms for hypercube computers
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
Global analysis for partitioning non-strict programs into sequential threads
LFP '92 Proceedings of the 1992 ACM conference on LISP and functional programming
A general framework for iteration-reordering loop transformations
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
A parallel hashed Oct-Tree N-body algorithm
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A parallel adaptive fast multipole method
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors
ICS '94 Proceedings of the 8th international conference on Supercomputing
Precise concrete type inference for object-oriented languages
OOPSLA '94 Proceedings of the ninth annual conference on Object-oriented programming systems, language, and applications
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Software caching and computation migration in Olden
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Commutativity analysis: a new analysis framework for parallelizing compilers
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Thread scheduling for cache locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in C
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatic inline allocation of objects
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Application-specific protocols for user-level shared memory
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Parallelizing Programs with Recursive Data Structures
IEEE Transactions on Parallel and Distributed Systems
Compiler-Controlled Multithreading for Lenient Parallel Languages
Proceedings of the 5th ACM Conference on Functional Programming Languages and Computer Architecture
Compositional C++: Compositional Parallel Programming
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Type Directed Cloning for Object-Oriented Programs
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
Connection Analysis: A Practical Interprocedural Heap Analysis for C
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
ICC++-AC++ Dialect for High Performance Parallel Computing
ISOTAS '96 Proceedings of the Second JSSST International Symposium on Object Technologies for Advanced Software
Supporting High Level Programming with High Performance: The Illinois Concert System
HIPS '97 Proceedings of the 1997 Workshop on High-Level Programming Models and Supportive Environments (HIPS '97)
Automatic compiler techniques for thread coarsening for multithreaded architectures
Proceedings of the 14th international conference on Supercomputing
High-Level Parallel Programming of an Adaptive Mesh Application Using the Illinois Concert System
ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
Hi-index | 0.00 |
Loop tiling and communication optimization, such as message pipelining and aggregation, can achieve optimized and robust memory performance by proactively managing storage and data movement. In this paper, we generalize these techniques to pointer-based data structures (PBDSs). Our approach, dynamic pointer alignment (DPA), has two components. The compiler decomposes a program into non-blocking threads that operate on specific pointers and labels thread creation sites with their corresponding pointers. At runtime, an explicit mapping from pointers to dependent threads is updated at thread creation and is used to dynamically schedule both threads and communication, such that threads using the same objects execute together, communication overlaps with local work, and messages are aggregated. We have implemented DPA to optimize remote reads to global PBDSs on parallel machines. Our empirical results on the force computation phases of two applications that use sophisticated PBDSs, Barnes-Hut and FMM, show that DPA achieves good absolute performance and speedups by enabling tiling and communication optimization on the CRAY T3D.