Query optimization by simulated annealing
SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Benchmarking and comparison of the task graph scheduling algorithms
Journal of Parallel and Distributed Computing
The state of the art in distributed query processing
ACM Computing Surveys (CSUR)
Multiobjective query optimization
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Parallel Query Scheduling and Optimization with Time- and Space-Shared Resources
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The Anatomy of the Grid: Enabling Scalable Virtual Organizations
CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
GriPhyN and LIGO, Building a Virtual Data Grid for Gravitational Wave Scientists
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Algorithm Design
E-SCIENCE '06 Proceedings of the Second IEEE International Conference on e-Science and Grid Computing
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Data driven workflow planning in cluster management systems
Proceedings of the 16th international symposium on High performance distributed computing
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
GridDB: a data-centric overlay for scientific grids
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
The cost of doing science on the cloud: the Montage example
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
The Claremont report on database research
ACM SIGMOD Record
Heuristic for resources allocation on utility computing infrastructures
Proceedings of the 6th international workshop on Middleware for grid computing
A break in the clouds: towards a cloud definition
ACM SIGCOMM Computer Communication Review
Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking
International Journal of Computational Science and Engineering
AINA '10 Proceedings of the 2010 24th IEEE International Conference on Advanced Information Networking and Applications
Nefeli: Hint-Based Execution of Workloads in Clouds
ICDCS '10 Proceedings of the 2010 IEEE 30th International Conference on Distributed Computing Systems
Elastic complex event processing
Proceedings of the 8th Middleware Doctoral Symposium
Optimizing analytic data flows for multiple execution engines
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Cost models for view materialization in the cloud
Proceedings of the 2012 Joint EDBT/ICDT Workshops
A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds
Journal of Grid Computing
Multi-objective optimization of data flows in a multi-cloud environment
Proceedings of the Second Workshop on Data Analytics in the Cloud
Scheduling data processing flows under budget constraint on the cloud
Proceedings of the 2013 Research in Adaptive and Convergent Systems
A framework for analyzing monetary cost of database systems in the cloud
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Dynamic scheduling on video transcoding for MPEG DASH in the cloud environment
Proceedings of the 5th ACM Multimedia Systems Conference
Hybrid Analytic Flows-the Case for Optimization
Fundamenta Informaticae - Scalable Workflow Enactment Engines and Technology
Hi-index | 0.00 |
Scheduling data processing workflows (dataflows) on the cloud is a very complex and challenging task. It is essentially an optimization problem, very similar to query optimization, that is characteristically different from traditional problems in two aspects: Its space of alternative schedules is very rich, due to various optimization opportunities that cloud computing offers; its optimization criterion is at least two-dimensional, with monetary cost of using the cloud being at least as important as query completion time. In this paper, we study scheduling of dataflows that involve arbitrary data processing operators in the context of three different problems: 1) minimize completion time given a fixed budget, 2) minimize monetary cost given a deadline, and 3) find trade-offs between completion time and monetary cost without any a-priori constraints. We formulate these problems and present an approximate optimization framework to address them that uses resource elasticity in the cloud. To investigate the effectiveness of our approach, we incorporate the devised framework into a prototype system for dataflow evaluation and instantiate it with several greedy, probabilistic, and exhaustive search algorithms. Finally, through several experiments that we have conducted with the prototype elastic optimizer on numerous scientific and synthetic dataflows, we identify several interesting general characteristics of the space of alternative schedules as well as the advantages and disadvantages of the various search algorithms. The overall results are quite promising and indicate the effectiveness of our approach.