Allocating Independent Subtasks on Parallel Processors
IEEE Transactions on Software Engineering
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
Process control and scheduling issues for multiprogrammed shared-memory multiprocessors
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
Implementation and performance of Munin
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Lazy release consistency for software distributed shared memory
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Effective distributed scheduling of parallel workloads
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
An integrated compile-time/run-time software distributed shared memory system
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler and software distributed shared memory support for irregular applications
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Cashmere-2L: software coherent shared memory on a clustered remote-write network
Proceedings of the sixteenth ACM symposium on Operating systems principles
A closer look at coscheduling approaches for a network of workstations
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Accurate data redistribution cost estimation in software distributed shared memory systems
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Memory Channel Network for PCI
IEEE Micro
An Implementation of Interprocedural Bounded Regular Section Analysis
IEEE Transactions on Parallel and Distributed Systems
An Adaptive Approach to Data Placement
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Demand-based coscheduling of parallel jobs on multiprogrammed multiprocessors
Demand-based coscheduling of parallel jobs on multiprogrammed multiprocessors
Accurate data redistribution cost estimation in software distributed shared memory systems
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Scaling irregular parallel codes with minimal programming effort
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Dyn-MPI: Supporting MPI on Non Dedicated Clusters
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
PROC: Process ReOrdering-Based Coscheduling on Workstation Clusters
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Using multiple energy gears in MPI programs on a power-scalable cluster
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
The MHETA Execution Model for Heterogeneous Clusters
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
A semi-static approach to mapping dynamic iterative tasks onto heterogeneous computing systems
Journal of Parallel and Distributed Computing
HeteroMPI: Towards a message-passing library for heterogeneous networks of computers
Journal of Parallel and Distributed Computing
Dyn-MPI: Supporting MPI on medium-scale, non-dedicated clusters
Journal of Parallel and Distributed Computing
A runtime resolution scheme for priority boost conflict in implicit coscheduling
The Journal of Supercomputing
Adaptive Allocation of Independent Tasks to Maximize Throughput
IEEE Transactions on Parallel and Distributed Systems
Program phase detection and exploitation
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
Networks of workstations (NOWs), which are generally composed of autonomous compute elements networked together, are an attractive parallel computing platform since they offer high performance at low cost. The autonomous nature of the environment, however, often results in inefficient utilization due to load imbalances caused by three primary factors: 1) unequal load (compute or communication) assignment to equally-powerful compute nodes, 2) unequal resources at compute nodes, and 3) multiprogramming. These load imbalances result in idle waiting time on cooperating processes that need to synchronize or communicate data. Additional waiting time may result due to local scheduling decisions in a multiprogrammed environment. In this paper, we present a combined approach of compile-time analysis, run-time load distribution, and operating system scheduler cooperation for improved utilization of available resources in an autonomous NOW. The techniques we propose allow efficient resource utilization by taking into consideration all three causes of load imbalance in addition to locality of access in the process of load distribution. The resulting adaptive load distribution and cooperative scheduling system allows applications to take advantage of parallel resources when available by providing better performance than when the loaded resources are not used at all.