Network-Based Multicomputers: A Practical Supercomputer Architecture
IEEE Transactions on Parallel and Distributed Systems
A parallel iterative linear system solver with dynamic load balancing
ICS '98 Proceedings of the 12th international conference on Supercomputing
SCR algorithm: saving/restoring states of file systems
ACM SIGOPS Operating Systems Review
Prediction and adaptation in Active Harmony
Cluster Computing
ECO: Efficient Collective Operations for Communication on Heterogeneous Networks
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Run-Time Support for Adaptive Load Balancing
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Adaptive Parallelism for OpenMP Task Parallel Programs
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Realistic CPU Workloads through Host Load Trace Playback
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
The Statistical Properties of Hoast Load
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Automatic management of CPU and I/O bottlenecks in distributed applications on ATM networks
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Customized dynamic load balancing for a network of workstations
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Issues in the Design of a Reflective Library for Checkpointing C++ Objects
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
Grid resource management
Dynamic topology adaptation of virtual networks of virtual machines
LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
MPI performance analysis tools on Blue Gene/L
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
The statistical properties of host load
Scientific Programming
Parallel processing with windows NT networks
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
IDEAS'97 Proceedings of the 1997 international conference on International database engineering and applications symposium
Hi-index | 0.00 |
Writing parallel programs for distributed multi-user computing environments is a difficult task. The Distributed object migration environment (Dome) addresses three major issues of parallel computing in an architecture independent manner: ease of programming, dynamic load balancing, and fault tolerance. Dome programmers, with modest effort, can write parallel programs that are automatically distributed over a heterogeneous network, dynamically load balanced as the program runs, and able to survive compute node and network failures. This paper provides the motivation for and an overview of Dome, including a preliminary performance evaluation of dynamic load balancing for distributed vectors. Dome programs are shorter and easier to write than the equivalent programs written with message passing primitives. The performance overhead of Dome is characterized, and it is shown that this overhead can be recouped by dynamic load balancing in imbalanced systems. Finally, we show that a parallel program can be made failure resilient through Dome''s architecture independent checkpoint and restart mechanisms.