Dynamic Cluster Resource Allocations for Jobs with Known and Unknown Memory Demands

Authors:
Li Xiao;Sonqing Chen;Xiaodong Zhang
Affiliations:
College of William and Mary, Williamsburg, VA;College of William and Mary, Williamsburg, VA;College of William and Mary, Williamsburg, VA
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2002

Citing 25
Cited 19

The limited performance benefits of migrating active processes for load sharing

SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The Influence of Different Workload Descriptions on a Heuristic Load Balancing Scheme

IEEE Transactions on Software Engineering
Transparent process migration: design alternatives and the sprite implementation

Software—Practice & Experience
Analysis of the impact of memory in distributed parallel processing systems

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Implementing global memory management in a workstation cluster

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Adaptive page replacement based on memory reference behavior

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Managing server load in global memory systems

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Exploiting process lifetime distributions for dynamic load balancing

ACM Transactions on Computer Systems (TOCS)
A scalable parallel cell-projection volume rendering algorithm for three-dimensional unstructured data

PRS '97 Proceedings of the IEEE symposium on Parallel rendering
Coordinating parallel processes on networks of workstations

Journal of Parallel and Distributed Computing
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Availability and utility of idle memory in workstation clusters

SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
An Opportunity Cost Approach for Job Assignment in a Scalable Computing Cluster

IEEE Transactions on Parallel and Distributed Systems
Improving memory performance of sorting algorithms

Journal of Experimental Algorithmics (JEA)
Operating System Concepts

Operating System Concepts
Fast Bit-Reversals on Uniprocessors and Shared-Memory Multiprocessors

SIAM Journal on Scientific Computing
Improved Strategies for Dynamic Load Balancing

IEEE Concurrency
Cached DRAM for ILP Processor Memory Access Latency Reduction

IEEE Micro
Job Characteristics of a Production Parallel Scientivic Workload on the NASA Ames iPSC/860

IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Gang Scheduling with Memory Considerations

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Effective Load Sharing on Heterogeneous Networks of Workstations

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Improving Distributed Workload Performance by Sharing Both CPU and Memory Resources

ICDCS '00 Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)
Reducing DRAM Latencies with an Integrated Memory Hierarchy Design

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Dynamic Load Sharing With Unknown Memory Demands in Clusters

ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Implementation of a reliable remote memory pager

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference

TPF: a dynamic system thrashing protection facility

Software—Practice & Experience
Adaptive Memory Allocations in Clusters to Handle Unexpectedly Large Data-Intensive Jobs

IEEE Transactions on Parallel and Distributed Systems
Heuristic Contention-Free Broadcast in Heterogeneous Networks of Workstations with Multiple Send and Receive Speeds

The Journal of Supercomputing
Memory Conscious Task Partition and Scheduling in Grid Environments

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
System Support to Balance the Resource Supply and Demand in High-end Computing

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
A general framework to understand parallel performance in heterogeneous clusters: analysis of a new adaptive parallel genetic algorithm

Journal of Parallel and Distributed Computing
Effectively Utilizing Global Cluster Memory for Large Data-Intensive Parallel Programs

IEEE Transactions on Parallel and Distributed Systems
Memory latency consideration for load sharing on heterogeneous network of workstations

Journal of Systems Architecture: the EUROMICRO Journal
Effect of network latency on load sharing in distributed systems

Journal of Parallel and Distributed Computing
Parallel CBIR implementations with load balancing algorithms

Journal of Parallel and Distributed Computing - Special issue on parallel bioinspired algorithms
Data partitioning for multiprocessors with memory heterogeneity and memory constraints

Scientific Programming - International Symposium of Parallel and Distributed Computing & International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogenous Networks
WE-AMBLE: a Workflow Engine To Manage Awareness in Collaborative Grid Environments

International Journal of High Performance Computing Applications
Workflow-based resource allocation to optimize overall performance of composite services

Future Generation Computer Systems
AMBLE: An Awareness Model for Balancing the Load in collaborative grid Environments

GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
Covering the cooperative load balancing delivery in collaborative grid environments

Multiagent and Grid Systems - New tendencies on agents and grid environments
A collaborative-aware task balancing delivery model for clusters

GPC'07 Proceedings of the 2nd international conference on Advances in grid and pervasive computing
An approach to distributed fault injection experiments

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Parallel implementation of evolutionary strategies on heterogeneous clusters with load balancing

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A performance and energy optimization mechanism for cooperation-oriented multiple server clusters

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The cluster system we consider for load sharing is a compute farm which is a pool of networked server nodes providing high-performance computing for CPU-intensive, memory-intensive, and I/O active jobs in a batch mode. Existing resource management systems mainly target at balancing the usage of CPU loads among server nodes. With the rapid advancement of CPU chips, memory and disk access speed improvements significantly lag behind advancement of CPU speed, increasing the penalty for data movement, such as page faults and I/O operations, relative to normal CPU operations. Aiming at reducing the memory resource contention caused by page faults and I/O activities, we have developed and examined load sharing policies by considering effective usage of global memory in addition to CPU load balancing in clusters. We study two types of application workloads: 1) Memory demands are known in advance or are predictable and 2) memory demands are unknown and dynamically changed during execution. Besides using workload traces with known memory demands, we have also made kernel instrumentation to collect different types of workload execution traces to capture dynamic memory access patterns. Conducting different groups of trace-driven simulations, we show that our proposed policies can effectively improve overall job execution performance by well utilizing both CPU and memory resources with known and unknown memory demands.