Improving a Local Learning Technique for QueueWait Time Predictions

Authors:
Hui Li;Juan Chen;Ying Tao;David Gro;Lex Wolters
Affiliations:
Leiden University, The Netherlands;Leiden University, The Netherlands;Leiden University, The Netherlands;National Institute for Nuclear and High Energy Physics (NIKHEF), The Netherlands;Leiden University, The Netherlands
Venue:
CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Year:
2006

Citing 0
Cited 7

The XtreemOS jScheduler: using self-scheduling techniques in large computing architectures

LASCO'08 First USENIX Workshop on Large-Scale Computing
Scientific workflow scheduling in computational grids Planning, reservation, and data/network-awareness

GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
Using Templates to Predict Execution Time of Scientific Workflow Applications in the Grid

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Predicting the execution time of grid workflow applications through local learning

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
A job self-scheduling policy for HPC infrastructures

JSSPP'07 Proceedings of the 13th international conference on Job scheduling strategies for parallel processing
Service control with the preemptive parallel job scheduler Scojo-PECT

Cluster Computing
Optimizing execution time predictions of scientific workflow applications in the Grid through evolutionary programming

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Local learning has been proposed as a common framework to predict both application run times and queue wait times based on workload traces [8]. The queue wait time is shown to be more difficult and expensive to predict because its distance calculations typically involve not only job attributes but also resource states. In this paper methods and algorithms are investigated to improve prediction accuracy and prediction performance for queue wait times. Firstly, the so-called "local tuning" is adopted to tune parameters for each training subset divided by a pivot attribute (e.g., group or queue name). Bias-variance analysis of error is conducted on local tuning and its global counterparts - tuning parameters on the whole training set. A method is then developed to select tuning type adaptively based on the generalization error and bias-variance decomposition. Secondly, an efficient search tree structure called "M-Tree" is integrated into our algorithm to speed up k-nearest neighbor search. Experimental studies are conducted to evaluate the proposed methods and algorithms using real-world workload traces, which are collected from the NIKHEF production cluster on the LHC Computing Grid and Blue Horizon in the San Diego Supercomputer Center (SDSC). The results show that adaptive tuning can reduce the average prediction error by 3 to 10 percents compared to global tuning, and that the M-Tree nearest neighbor search is up to 8 times faster than the sequential search.