The limited performance benefits of migrating active processes for load sharing
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Implementing global memory management in a workstation cluster
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Exploiting process lifetime distributions for dynamic load balancing
ACM Transactions on Computer Systems (TOCS)
Availability and utility of idle memory in workstation clusters
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The impact of job memory requirements on gang-scheduling performance
ACM SIGMETRICS Performance Evaluation Review
The impact of job arrival patterns on parallel scheduling
ACM SIGMETRICS Performance Evaluation Review
An Opportunity Cost Approach for Job Assignment in a Scalable Computing Cluster
IEEE Transactions on Parallel and Distributed Systems
A hierarchical load-balancing framework for dynamic multithreaded computations
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Operating System Concepts, 4th Ed.
Operating System Concepts, 4th Ed.
Dynamic Cluster Resource Allocations for Jobs with Known and Unknown Memory Demands
IEEE Transactions on Parallel and Distributed Systems
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Job Characteristics of a Production Parallel Scientivic Workload on the NASA Ames iPSC/860
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
TPF: a dynamic system thrashing protection facility
Software—Practice & Experience
Effects of clock resolution on the scheduling of interactive and soft real-time processes
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Classifying scheduling policies with respect to unfairness in an M/GI/1
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Incorporating Job Migration and Network RAM to Share Cluster Memory Resources
HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Gang Scheduling with Memory Considerations
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Improving Distributed Workload Performance by Sharing Both CPU and Memory Resources
ICDCS '00 Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)
Adaptive and Virtual Reconfigurations for Effective Dynamic Job Scheduling in Cluster Systems
ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
System Support to Balance the Resource Supply and Demand in High-end Computing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Journal of Parallel and Distributed Computing
Effectively Utilizing Global Cluster Memory for Large Data-Intensive Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Winner Price Monotonicity for Approximated Combinatorial Auctions
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
FUZZ-IEEE'09 Proceedings of the 18th international conference on Fuzzy Systems
An experimental analysis of biased parallel greedy approximation for combinatorial auctions
International Journal of Intelligent Information and Database Systems
Hi-index | 0.00 |
In a cluster system with dynamic load sharing support, a job submission or migration to a workstation is determined by the availability of CPU and memory resources of the workstation at the time [21]. In such a system, a small number of running jobs with unexpectedly large memory allocation requirements may significantly increase the queuing delay times of the rest of jobs with normal memory requirements, slowing down execution of each individual job and decreasing the system throughput. We call this phenomenon the job blocking problem because the big jobs block the execution pace of majority jobs in the cluster. Since the memory demand of jobs may not be known in advance and may change dynamically, the possibility of unsuitable job submissions/migrations to cause the blocking problem is high, and existing load sharing schemes are unable to effectively handle this problem. We propose two schemes to address this problem. The first scheme, Network RAM supported load sharing, combines job migrations with network RAM, which uses remote execution to initially allocate a job to the most lightly loaded workstation and, if necessary, network RAM to provide a global memory space for the job larger than it would be available otherwise. This scheme has the merits of both job migrations and network RAM. Our experiments show its effectiveness and scalability. However, this scheme requires a network RAM facility in the cluster, which may cause additional overhead and increase cluster network traffic. In order to address this limit, we propose a second scheme, memory reservation, incorporated with dynamic load sharing, which adaptively reserves a small set of workstations to provide special services to the jobs demanding large memory allocations. As soon as the blocking problem is resolved by the memory reservation scheme, the system will adaptively switch back to the normal load sharing state. Both schemes target on handling large data-intensive jobs in clusters, and are mutually complementary. The network RAM supported load sharing scheme can fully utilize the cluster global memory space, while the memory reservation scheme has the advantage of simple implementations and low overhead. Thus, they both can be effective alternatives, and practically deployed in cluster computing under different system conditions.