Predicting Application Run Times Using Historical Information
IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Pace--A Toolset for the Performance Prediction of Parallel and Distributed Systems
International Journal of High Performance Computing Applications
Cost-Based Scheduling of Scientific Workflow Application on Utility Grids
E-SCIENCE '05 Proceedings of the First International Conference on e-Science and Grid Computing
Amazon S3 for science grids: a viable solution?
DADC '08 Proceedings of the 2008 international workshop on Data-aware distributed computing
The cost of doing science on the cloud: the Montage example
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Proceedings of the 2008 annual research conference of the South African Institute of Computer Scientists and Information Technologists on IT research in developing countries: riding the wave of technology
Empirical evaluation of latency-sensitive application performance in the cloud
MMSys '10 Proceedings of the first annual ACM SIGMM conference on Multimedia systems
EC2 performance analysis for resource provisioning of service-oriented applications
ICSOC/ServiceWave'09 Proceedings of the 2009 international conference on Service-oriented computing
Hi-index | 0.00 |
Text analysis tools are nowadays required to process increasingly large corpora which are often organized as small files (abstracts, news articles, etc). Cloud computing offers a convenient, on-demand, pay-as-you-go computing environment for solving such problems. We investigate provisioning on the Amazon EC2 cloud from the user perspective, attempting to provide a scheduling strategy that is both timely and cost effective. We rely on the empirical performance of the application of interest on smaller subsets of data, to construct an execution plan. A first goal of our performance measurements is to determine an optimal file size for our application to consume. Using the subset-sum first fit heuristic we reshape the input data by merging files in order to match as closely as possible the desired file size. This also speeds up the task of retrieving the results of our application, by having the output be less segmented. Using predictions of the performance of our application based on measurements on small data sets, we devise an execution plan that meets a user specified deadline while minimizing cost.