Improving Goodput by Coscheduling CPU and Network Capacity

Authors:
Jim Basney;Miron Livny
Affiliations:
Computer Sciences Department, University of Wisconsin-Madison, Wisconsin, U.S.A.;Computer Sciences Department, University of Wisconsin-Madison, Wisconsin, U.S.A.
Venue:
International Journal of High Performance Computing Applications
Year:
1999

Citing 5
Cited 9

The available capacity of a privately owned workstation environment

Performance Evaluation
High-throughput resource management

The grid
Forecasting network performance to support dynamic scheduling using the network weather service

HPDC '97 Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing
Matchmaking: Distributed Resource Management for High Throughput Computing

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Improving the performance of coordinated checkpointers on networks of workstations using RAID techniques

SRDS '96 Proceedings of the 15th Symposium on Reliable Distributed Systems

Performance and interoperability issues in incorporating cluster management systems within a wide-area network-computing environment

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
References

Grid resource management
Tactics-based remote execution for mobile computing

Proceedings of the 1st international conference on Mobile systems, applications and services
Slingshot: deploying stateful services in wireless hotspots

Proceedings of the 3rd international conference on Mobile systems, applications, and services
Mining for misconfigured machines in grid systems

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Inter-operating grids through delegated matchmaking

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Inter-operating grids through Delegated MatchMaking

Scientific Programming - Large-Scale Programming Tools and Environments
Application classification through monitoring and learning of resource consumption patterns

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Multi-domain job coscheduling for leadership computing systems

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a cluster computing environment, executable, checkpoint, and data files must be transferred between application submission and execution sites. As the memory footprint of cluster applications increases, saving and restoring the state of a computation in such an environment may require substantial network resources at both the start and the end of a CPU allocation. During the allocation, the application may also consume network bandwidth to periodically transfer a checkpoint back to the submission site or checkpoint server and to access remote data files. Under most circumstances, the application cannot use the allocated CPU while these transfers are in progress. Furthermore, if the application is unable to transfer a checkpoint or successfully migrate at preemption time, work already accomplished by the application is lost. The authors define goodputas the allocation time when a remotely executing application uses the CPU to make forward progress. Goodput can be significantly less than allocated throughput due to network activity. The authors are currently engaged in an effort to develop coscheduling techniques for CPU and network resources that will improve the goodput delivered by Condor pools. They report techniques that they have developed so far, how they were implemented in Condor, and their preliminary impact on the goodput of the authors' production Condor pool.