Trace-based evaluation of job runtime and queue wait time predictions in grids

Authors:
Ozan Sonmez;Nezih Yigitbasi;Alexandru Iosup;Dick Epema
Affiliations:
Delft Univesity of Technology, Delft, Netherlands;Delft Univesity of Technology, Delft, Netherlands;Delft Univesity of Technology, Delft, Netherlands;Delft Univesity of Technology, Delft, Netherlands
Venue:
Proceedings of the 18th ACM international symposium on High performance distributed computing
Year:
2009

Citing 30
Cited 15

Probability, random processes, and estimation theory for engineers

Probability, random processes, and estimation theory for engineers
PAWS: A Performance Evaluation Tool for Parallel Computing Systems

Computer - Special issue on experimental research in computer architecture
The AppLeS parameter sweep template: user-level middleware for the grid

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Predicting Queue Times on Space-Sharing Parallel Computers

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Theory and Practice in Parallel Job Scheduling

IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Using Queue Time Predictions for Processor Allocation

IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Predicting Application Run Times Using Historical Information

IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Adaptive Computing on the Grid Using AppLeS

IEEE Transactions on Parallel and Distributed Systems
Experiences with predicting resource performance on-line in computational grid settings

ACM SIGMETRICS Performance Evaluation Review
Dynamic Matching and Scheduling of a Class of Independent Tasks onto Heterogeneous Computing Systems

HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
Run-Time Statistical Estimation of Task Execution Times for Heterogeneous Distributed Computing

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Predictive Application-Performance Modeling in a Computational Grid Environment

HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
Adaptive Distributed Computing through Competition

ICCDS '96 Proceedings of the 3rd International Conference on Configurable Distributed Systems
Utilization and Predictability in Scheduling the IBM SP2 with Backfilling

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
The workload on parallel supercomputers: modeling the characteristics of rigid jobs

Journal of Parallel and Distributed Computing
Automatic methods for predicting machine availability in desktop Grid and peer-to-peer systems

CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
Predicting bounds on queuing delay for batch-scheduled parallel machines

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Dynamic load balancing experiments in a grid

CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2 - Volume 02
A prediction method for job runtimes on shared processors: Survey, statistical analysis and new avenues

Performance Evaluation
Failure Prediction in Computational Grids

ANSS '07 Proceedings of the 40th Annual Simulation Symposium
QBETS: queue bounds estimation from time series

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Estimation of Execution times on Heterogeneous Supercomputer Architectures

ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01
Backfilling Using System-Generated Predictions Rather than User Runtime Estimates

IEEE Transactions on Parallel and Distributed Systems
Predict task running time in grid environments based on CPU load predictions

Future Generation Computer Systems
The Grid Workloads Archive

Future Generation Computer Systems
The performance of bags-of-tasks in large-scale distributed systems

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
How are Real Grids Used? The Analysis of Four Grid Traces and Its Implications

GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
On grid performance evaluation using synthetic workloads

JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Dynamic load balancing for a grid application

HiPC'04 Proceedings of the 11th international conference on High Performance Computing
The characteristics and performance of groups of jobs in grids

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Discovering Piecewise Linear Models of Grid Workload

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Overdimensioning for Consistent Performance in Grids

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Performance analysis of dynamic workflow scheduling in multicluster grids

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Adaps - A three-phase adaptive prediction system for the run-time of jobs based on user behaviour

Journal of Computer and System Sciences
A multi-strategy collaborative prediction model for the runtime of online tasks in computing cluster/grid

Cluster Computing
Towards Non-Stationary Grid Models

Journal of Grid Computing
Performance Evaluation of Overload Control in Multi-cluster Grids

GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
Resource optimization in distributed real-time multimedia applications

Multimedia Tools and Applications
Coordinated rescheduling of Bag-of-Tasks for executions on multiple resource providers

Concurrency and Computation: Practice & Experience
Evaluation of reallocation heuristics for moldable tasks in computational grids

AusPDC '11 Proceedings of the Ninth Australasian Symposium on Parallel and Distributed Computing - Volume 118
State-based predictions with self-correction on Enterprise Desktop Grid environments

Journal of Parallel and Distributed Computing
Exploring portfolio scheduling for long-term execution of scientific workloads in IaaS clouds

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Scheduling HPC workflows for responsiveness and fairness with networking delays and inaccurate estimates of execution times

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Toward fine-grained online task characteristics estimation in scientific workflows

WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
A job submission manager for large-scale distributed systems based on job futurity predictor

International Journal of Grid and Utility Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large-scale distributed computing systems such as grids are serving a growing number of scientists. These environments bring about not only the advantages of an economy of scale, but also the challenges of resource and workload heterogeneity. A consequence of these two forms of heterogeneity is that job runtimes and queue wait times are highly variable, which generally reduces system performance and makes grids difficult to use by the common scientist. Predicting job runtimes and queue wait times have been widely studied for parallel environments. However, there is no detailed investigation on how the proposed prediction methods perform in grids, whose resource structure and workload characteristics are very different from those in parallel systems. In this paper, we assess the performance and benefit of predicting job runtimes and queue wait times in grids based on traces gathered from various research and production grid environments. First, we evaluate the performance of simple yet widely used time series prediction methods and the effect of applying them to different types of job classes (e.g., all jobs submitted by single users or to single sites). Then, we investigate the performance of two kinds of queue wait time prediction methods for grids. Last, we investigate whether prediction-based grid-level scheduling policies can have better performance than policies that do not use predictions.