Quelling Queue Storms

Authors:
Stephen D. Kleban;Scott H. Clearwater
Affiliations:
-;-
Venue:
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Year:
2003

Citing 0
Cited 6

Hierarchical Dynamics, Interarrival Times, and Performance

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Addressing Sporadic Contention on Shared Computing Clusters

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Workload management of cooperatively federated computing clusters

The Journal of Supercomputing
Aggregate modeling for TCP sessions

Proceedings of the 2nd ACM international workshop on Wireless multimedia networking and performance modeling
Performance Evaluation of Overload Control in Multi-cluster Grids

GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
Managing irregular workloads of cooperatively shared computing clusters

ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper characterizes "queue storms" in supercomputer systems and discusses methods for quelling them. Queue storms are anomalously large queue lengths dependent upon the job size mix, the queuing system, the machine size, and correlations and dependencies between job submissions. We use synthetic data generated from actual job log data from the ASCI Blue Mountain supercomputer combined with different long-range dependencies. We show the distribution of times from the first storm to occur, which is in a sense the time when the machine becomes obsolete because it represents the time when the machine first fails to provide satisfactory turnaround. To overcome queue storms, more resources are needed even if they appear superfluous most of the time. We present two methods, including a grid-based solution, for reducing these correlations and their resulting effect on the size and frequency of queue storms.