Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
Implementation and performance of Munin
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
An optimizing Fortran D compiler for MIMD distributed-memory machines
An optimizing Fortran D compiler for MIMD distributed-memory machines
Using fine-grain threads and run-time decision making in parallel computing
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Compiler and software distributed shared memory support for irregular applications
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing communication in HPF programs on fine-grain distributed shared memory
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatic data layout for distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
An Implementation of Interprocedural Bounded Regular Section Analysis
IEEE Transactions on Parallel and Distributed Systems
An Adaptive Approach to Data Placement
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Home-based shared virtual memory
Home-based shared virtual memory
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Dyn-MPI: Supporting MPI on Non Dedicated Clusters
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Using multiple energy gears in MPI programs on a power-scalable cluster
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
The MHETA Execution Model for Heterogeneous Clusters
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A semi-static approach to mapping dynamic iterative tasks onto heterogeneous computing systems
Journal of Parallel and Distributed Computing
Dyn-MPI: Supporting MPI on medium-scale, non-dedicated clusters
Journal of Parallel and Distributed Computing
Just-in-time dynamic voltage scaling: Exploiting inter-node slack to save energy in MPI programs
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Distributing data is one of the key problems in implementing efficient distributed-memory parallel programs. The problem becomes more difficult in programs where data redistribution between computational phases is considered. The global data distribution problem is to find the optimal distribution in multi-phase parallel programs. Solving this problem requires accurate knowledge of data redistribution cost.We are investigating this problem in the context of a software distributed shared memory (SDSM) system, in which obtaining accurate redistribution cost estimates is difficult. This is because SDSM communication is implicit: It depends on access patterns, page locations, and the SDSM consistency protocol.We have developed integrated compile- and run-time analysis for SDSM systems to determine accurate redistribution cost estimates with low overhead. Our resulting system, SUIF-Adapt, can efficiently and accurately estimate execution time, including redistribution, to within 5% of the actual time in all of our test cases and is often much closer. These precise costs enable SUIF-Adapt to find efficient global data distributions in multiple-phase programs.