Incorporating Job Migration and Network RAM to Share Cluster Memory Resources

  • Authors:
  • Li Xiao;Xiaodong Zhang;Stefan A. Kubricht

  • Affiliations:
  • -;-;-

  • Venue:
  • HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Job migrations and network RAM are two major approaches for effectively using global memory resources in a workstation cluster, aimed at reducing page faults in each local workstation and improving the overall performance of cluster computing. Using remote executions or preemptive migrations, a load sharing system is able to migrate a job from a workstation without sufficient memory space to a lightly loaded workstation with large idle memory space for the migrated job. In a network RAM system, if a job cannot find sufficient memory space for its working sets, it will utilize idle memory space from other workstations in the cluster through remote paging. Conducting trace-driven simulations, we have compared the performance and trade-offs of the two approaches and their impacts on job execution time and cluster scalability. Our study indicates that job-migration-based load sharing schemes are able to balance executions of jobs in a cluster well, while network RAM is able to satisfy data-intensive jobs which may not be migratable by sharing all the idle memory resources in a cluster. We also show that a network RAM cluster of workstations is scalable only if the network is sufficiently fast. Finally, we propose an improved load-sharing scheme by combining job migrations with network RAM for cluster computing. This scheme uses remote execution to initially allocate a job to the most lightly loaded workstation and, if necessary, network RAM to provide a larger memory space for the job than would be available otherwise. The improved scheme has the merits of both job migrations and network RAM. Our experiments show its effectiveness and scalability for cluster computing.