Scheduling memory constrained jobs on distributed memory parallel computers
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Coordinated allocation of memory and processors in multiprocessors
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The utility of exploiting idle workstations for parallel computation
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Availability and utility of idle memory in workstation clusters
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Mechanisms and policies for supporting fine-grained cycle stealing
ICS '99 Proceedings of the 13th international conference on Supercomputing
ICS '01 Proceedings of the 15th international conference on Supercomputing
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems
ACM Transactions on Computer Systems (TOCS)
Paging tradeoffs in distributed-shared-memory multiprocessors
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Adaptive Scheduling under Memory Pressure on Multiprogrammed SMPs
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Dynamic Coscheduling on Workstation Clusters
IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Coscheduling under Memory Constraints in a NOW Environment
JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
Gang Scheduling with Memory Considerations
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Adaptive Scheduling under Memory Pressure on Multiprogrammed Clusters
CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
Condor-G: A Computation Management Agent for Multi-Institutional Grids
HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
Adaptive page replacement to protect thrashing in Linux
ALS '01 Proceedings of the 5th annual Linux Showcase & Conference - Volume 5
Adaptive Resource Utilization via Feedback Control for Streaming Applications
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Adaptive Parallel Job Scheduling with Flexible Coscheduling
IEEE Transactions on Parallel and Distributed Systems
Immediate mode scheduling in grid systems
International Journal of Web and Grid Services
Cooperating coscheduling: a coscheduling proposal aimed at non-dedicated heterongeneous NOWs
Journal of Computer Science and Technology
A progressive multi-layer resource reconfiguration framework for time-shared grid systems
Future Generation Computer Systems
Hi-index | 0.00 |
This paper presents scheduler extensions that enable better adaptation of parallel programs to the execution conditions of non-dedicated computational farms with limited memory resources. The purpose of the techniques is to prevent thrashing and co-schedule communicating threads, using two disjoint, yet cooperating extensions to the kernel scheduler. A thrashing prevention module enables memory-bound programs to adapt to memory shortage, via suspending their threads at selected points of execution. Thread suspension is used so that memory is not over-committed by parallel jobs--which are assumed to be running as guests on the nodes of the computational farm--at memory allocation points. In the event of thrashing, parallel jobs are the first to release memory and help local resident jobs make progress. Adaptation is implemented using a shared-memory interface in the/proc filesystem and upcalls from the kernel to the user space. On an orthogonal axis, co-scheduling is implemented in the kernel with a heuristic that boosts periodically the priority of communicating threads.Using experiments on a cluster of workstations, we show that when a guest parallel job competes with general-purpose interactive, I/O-intensive, or CPU and memory-intensive load on the nodes of the cluster, thrashing prevention reduces drastically the slowdown of the job at memory utilization levels of 20% or higher. The slowdown of parallel jobs is reduced by up to a factor of 7. Co-scheduling provides a limited performance improvement at memory utilization levels below 20%, but has no significant effect at higher memory utilization levels.