The XtreemOS jScheduler: using self-scheduling techniques in large computing architectures
LASCO'08 First USENIX Workshop on Large-Scale Computing
GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
Using Templates to Predict Execution Time of Scientific Workflow Applications in the Grid
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Predicting the execution time of grid workflow applications through local learning
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
A job self-scheduling policy for HPC infrastructures
JSSPP'07 Proceedings of the 13th international conference on Job scheduling strategies for parallel processing
Service control with the preemptive parallel job scheduler Scojo-PECT
Cluster Computing
Future Generation Computer Systems
Hi-index | 0.00 |
Local learning has been proposed as a common framework to predict both application run times and queue wait times based on workload traces [8]. The queue wait time is shown to be more difficult and expensive to predict because its distance calculations typically involve not only job attributes but also resource states. In this paper methods and algorithms are investigated to improve prediction accuracy and prediction performance for queue wait times. Firstly, the so-called "local tuning" is adopted to tune parameters for each training subset divided by a pivot attribute (e.g., group or queue name). Bias-variance analysis of error is conducted on local tuning and its global counterparts - tuning parameters on the whole training set. A method is then developed to select tuning type adaptively based on the generalization error and bias-variance decomposition. Secondly, an efficient search tree structure called "M-Tree" is integrated into our algorithm to speed up k-nearest neighbor search. Experimental studies are conducted to evaluate the proposed methods and algorithms using real-world workload traces, which are collected from the NIKHEF production cluster on the LHC Computing Grid and Blue Horizon in the San Diego Supercomputer Center (SDSC). The results show that adaptive tuning can reduce the average prediction error by 3 to 10 percents compared to global tuning, and that the M-Tree nearest neighbor search is up to 8 times faster than the sequential search.