The available capacity of a privately owned workstation environment
Performance Evaluation
High-throughput resource management
The grid
Forecasting network performance to support dynamic scheduling using the network weather service
HPDC '97 Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing
Matchmaking: Distributed Resource Management for High Throughput Computing
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
SRDS '96 Proceedings of the 15th Symposium on Reliable Distributed Systems
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Grid resource management
Tactics-based remote execution for mobile computing
Proceedings of the 1st international conference on Mobile systems, applications and services
Slingshot: deploying stateful services in wireless hotspots
Proceedings of the 3rd international conference on Mobile systems, applications, and services
Mining for misconfigured machines in grid systems
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Inter-operating grids through delegated matchmaking
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Inter-operating grids through Delegated MatchMaking
Scientific Programming - Large-Scale Programming Tools and Environments
Application classification through monitoring and learning of resource consumption patterns
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Multi-domain job coscheduling for leadership computing systems
The Journal of Supercomputing
Hi-index | 0.00 |
In a cluster computing environment, executable, checkpoint, and data files must be transferred between application submission and execution sites. As the memory footprint of cluster applications increases, saving and restoring the state of a computation in such an environment may require substantial network resources at both the start and the end of a CPU allocation. During the allocation, the application may also consume network bandwidth to periodically transfer a checkpoint back to the submission site or checkpoint server and to access remote data files. Under most circumstances, the application cannot use the allocated CPU while these transfers are in progress. Furthermore, if the application is unable to transfer a checkpoint or successfully migrate at preemption time, work already accomplished by the application is lost. The authors define goodputas the allocation time when a remotely executing application uses the CPU to make forward progress. Goodput can be significantly less than allocated throughput due to network activity. The authors are currently engaged in an effort to develop coscheduling techniques for CPU and network resources that will improve the goodput delivered by Condor pools. They report techniques that they have developed so far, how they were implemented in Condor, and their preliminary impact on the goodput of the authors' production Condor pool.