Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
A static performance estimator to guide data partitioning decisions
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
CHARM++: a portable concurrent object oriented system based on C++
OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
PARADIGM: a compiler for automatic data distribution on multicomputers
ICS '93 Proceedings of the 7th international conference on Supercomputing
An integrated compile-time/run-time software distributed shared memory system
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Automatic data layout for distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Transparent adaptive parallelism on NOWs using OpenMP
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Journal of Parallel and Distributed Computing
Dynamic data distribution with control flow analysis
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Accurate data redistribution cost estimation in software distributed shared memory systems
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
An Implementation of Interprocedural Bounded Regular Section Analysis
IEEE Transactions on Parallel and Distributed Systems
Adaptive Load Balancing for MPI Programs
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Automatic Selection of Dynamic Data Partitioning Schemes for Distributed-Memory Multicomputers
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
Predicting the CPU Availability of Time-Shared Unix Systems on the Computational Grid
HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
SPLASH: Stanford parallel applications for shared-memory
SPLASH: Stanford parallel applications for shared-memory
A comparative analysis of fine-grain threads packages
Journal of Parallel and Distributed Computing
Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
HeteroMPI: Towards a message-passing library for heterogeneous networks of computers
Journal of Parallel and Distributed Computing
Just-in-time dynamic voltage scaling: Exploiting inter-node slack to save energy in MPI programs
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Distributing data is a fundamental problem in implementing efficient distributed-memory parallel programs. The problem becomes more difficult in environments where the participating nodes are not dedicated to a parallel application. We are investigating the data distribution problem in non dedicated environments in the context of explicit message-passing programs. To address this problem, we have designed and implemented an extension to MPI called Dynamic MPI (Dyn-MPI). The key component of Dyn-MPI is its run-time system, which efficiently and automatically redistributes data on the fly when there are changes in the application or the underlying environment. Dyn-MPI supports efficient memory allocation, precise measurement of system load and computation time, and node removal. Performance results show that programs that use Dyn-MPI execute efficiently in non dedicated environments, including up to almost a three-fold improvement compared to programs that do not redistribute data and a 25% improvement over standard adaptive load balancing techniques.