Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing
IEEE Transactions on Parallel and Distributed Systems
The Vision of Autonomic Computing
Computer
Dynamic Matching and Scheduling of a Class of Independent Tasks onto Heterogeneous Computing Systems
HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
ASKALON: a tool set for cluster and Grid computing: Research Articles
Concurrency and Computation: Practice & Experience - Grid Performance
Taverna: lessons in creating a workflow environment for the life sciences: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Pegasus: A framework for mapping complex scientific workflows onto distributed systems
Scientific Programming
Automatic resource specification generation for resource selection
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Future Generation Computer Systems
Runtime Prediction Based Grid Scheduling of Parameter Sweep Jobs
APSCC '08 Proceedings of the 2008 IEEE Asia-Pacific Services Computing Conference
Estimating Resource Needs for Time-Constrained Workflows
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Trace-based evaluation of job runtime and queue wait time predictions in grids
Proceedings of the 18th ACM international symposium on High performance distributed computing
Using Templates to Predict Execution Time of Scientific Workflow Applications in the Grid
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
A Hybrid Intelligent Method for Performance Modeling and Prediction of Workflow Activities in Grids
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
NP-complete scheduling problems
Journal of Computer and System Sciences
IEEE Internet Computing
CCGRID '11 Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Markovian Workload Characterization for QoS Prediction in the Cloud
CLOUD '11 Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing
Measuring TeraGrid: workload characterization for a high-performance computing federation
International Journal of High Performance Computing Applications
Bi-criteria Workflow Tasks Allocation and Scheduling in Cloud Computing Environments
CLOUD '12 Proceedings of the 2012 IEEE Fifth International Conference on Cloud Computing
Characterizing and profiling scientific workflows
Future Generation Computer Systems
Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids
Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Workload characterization on a production Hadoop cluster: A case study on Taobao
IISWC '12 Proceedings of the 2012 IEEE International Symposium on Workload Characterization (IISWC)
A family of heuristics for agent-based elastic Cloud bag-of-tasks concurrent scheduling
Future Generation Computer Systems
Multi-task averaging via task clustering
SIMBAD'13 Proceedings of the Second international conference on Similarity-Based Pattern Recognition
Self-healing of workflow activity incidents on distributed computing infrastructures
Future Generation Computer Systems
Characterizing workflow-based activity on a production e-infrastructure using provenance data
Future Generation Computer Systems
Hi-index | 0.00 |
Task characteristics estimations such as runtime, disk space, and memory consumption, are commonly used by scheduling algorithms and resource provisioning techniques to provide successful and efficient workflow executions. These methods assume that accurate estimations are available, but in production systems it is hard to compute such estimates with good accuracy. In this work, we first profile three real scientific workflows collecting fine-grained information such as process I/O, runtime, memory usage, and CPU utilization. We then propose a method to automatically characterize workflow task needs based on these profiles. Our method estimates task runtime, disk space, and memory consumption based on the size of tasks input data. It looks for correlations between the parameters of a dataset, and if no correlation is found, the dataset is divided into smaller subsets by using a clustering technique. Task behavior estimates are done based on the ratio parameter/input data size if they are correlated, or based on the mean value. However, task dependencies in scientific workflows lead to a chain of estimation errors. To correct such errors, we propose an online estimation process based on the MAPE-K loop where task executions are constantly monitored and estimates are updated accordingly. Experiment results show that our online estimation process yields much more accurate predictions than an offline approach, where all task needs are estimated at once.