Dynamic Load Sharing With Unknown Memory Demands in Clusters

Authors:
Affiliations:
Venue:
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Year:
2001

Citing 11
Cited 5

Analysis of the impact of memory in distributed parallel processing systems

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Adaptive page replacement based on memory reference behavior

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Exploiting process lifetime distributions for dynamic load balancing

ACM Transactions on Computer Systems (TOCS)
A scalable parallel cell-projection volume rendering algorithm for three-dimensional unstructured data

PRS '97 Proceedings of the IEEE symposium on Parallel rendering
Availability and utility of idle memory in workstation clusters

SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Cache-optimal methods for bit-reversals

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Improving memory performance of sorting algorithms

Journal of Experimental Algorithmics (JEA)
Job Characteristics of a Production Parallel Scientivic Workload on the NASA Ames iPSC/860

IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Incorporating Job Migration and Network RAM to Share Cluster Memory Resources

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Gang Scheduling with Memory Considerations

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Improving Distributed Workload Performance by Sharing Both CPU and Memory Resources

ICDCS '00 Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)

Dynamic Cluster Resource Allocations for Jobs with Known and Unknown Memory Demands

IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of a Distributed Question/Answering System

IEEE Transactions on Parallel and Distributed Systems
Memory latency consideration for load sharing on heterogeneous network of workstations

Journal of Systems Architecture: the EUROMICRO Journal
Dynamic cluster resource allocations for jobs with known memory demands

Proceedings of the International Conference and Workshop on Emerging Trends in Technology
Towards a green cluster through dynamic remapping of virtual machines

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: A compute farm is a pool of clustered workstations to provide high performance computing services for CPU-intensive, memory-intensive, and I/O active jobs in a batch mode. Existing load sharing schemes with memory considerations assume jobs' memory demand sizes are known in advance or predictable based on users' hints. This assumption can greatly simplify the designs and implementations of load sharing schemes, but is not desirable in practice. In order to address this concern, we present three new results and contributions in this study. (1) Conducting Linux kernel instrumentation, we have collected different types of workload execution traces to quantitatively characterize job interactions, and modeled page fault behavior as a function of the overloaded memory sizes and the amount of jobs' I/O activities. (2) Based on experimental results and collected dynamic system information, we have built a simulation model which accurately emulates the memory system operations and job migrations with virtual memory considerations. (3) We have proposed a memory-centric load sharing scheme and its variations to effectively process dynamic memory allocation demands, aiming at minimizing execution time of each individual job by dynamically migrating and remotely submitting jobs to eliminate or reduce page faults and to reduce the queuing time for CPU services. Conducting trace-driven simulations, we have examined these load sharing policies to show their effectiveness.