Domain decomposition: parallel multilevel methods for elliptic partial differential equations
Domain decomposition: parallel multilevel methods for elliptic partial differential equations
The Legion vision of a worldwide virtual computer
Communications of the ACM
Multilevel diffusion schemes for repartitioning of adaptive meshes
Journal of Parallel and Distributed Computing - Special issue on dynamic load balancing
A parallel algorithm for multilevel graph partitioning and sparse matrix ordering
Journal of Parallel and Distributed Computing
The grid: blueprint for a new computing infrastructure
The grid: blueprint for a new computing infrastructure
Jacobi--Davidson Style QR and QZ Algorithms for the Reduction of Matrix Pencils
SIAM Journal on Scientific Computing
Future Generation Computer Systems - Special issue on metacomputing
Parallel methods and tools for predicting material properties
Computing in Science and Engineering
Solaris internals: core kernel architecture
Solaris internals: core kernel architecture
Application-level scheduling on distributed heterogeneous networks
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
A scalable cross-platform infrastructure for application performance tuning using hardware counters
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
INFORMS Journal on Computing
Scheduling From the Perspective of the Application
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Predicting the CPU Availability of Time-Shared Unix Systems on the Computational Grid
HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
Automatic Configuration and Run-time Adaptation of Distributed Applications
HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Iterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems
Exposing Application Alternatives
ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
A Portable Programming Interface for Performance Evaluation on Modern Processors
International Journal of High Performance Computing Applications
Darwin: customizable resource management for value-added network services
IEEE Network: The Magazine of Global Internetworking
Adaptive Scheduling under Memory Pressure on Multiprogrammed SMPs
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Adaptive scheduling under memory constraints on non-dedicated computational farms
Future Generation Computer Systems - Selected papers from CCGRID 2002
Journal of Computational Physics
VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Hi-index | 0.00 |
Clusters of workstations (COWs) and SMPs have become popular and cost effective means of solving scientific problems. Because such environments may be heterogenous and/or time shared, dynamic load balancing is central to achieving high performance. Our thesis is that new levels of sophistication are required in parallel algorithm design and in the interaction of the algorithms with the runtime system. To support this thesis, we illustrate a novel approach for application-level balancing of external CPU and memory load on parallel iterative methods that employ some form of local preconditioning on each node. There are two key ideas. First, because all nodes need not perform their portion of the preconditioning phase to the same accuracy, the code can achieve perfect loadbalance, dynamically adapting to external CPU load, if we stop the preconditioning phase on all processors after a fixed amount of time. Second, if the program detects memory thrashing on a node, it recedes its preconditioning phase from that node, hopefully speeding the completion of competing jobs hence the relinquishing of their resources. We have implemented our load balancing approach in a state-of-the-art, coarse grain parallel Jacobi-Davidson eigensolver. Experimental results show that the new method adapts its algorithm based on runtime system information, without compromising the overall convergence behavior. We demonstrate the effectiveness of the new algorithm in a COW environment under (a) variable CPU load and (b) variable memory availability caused by competing applications.