Scalable script-based data analysis workflows on clouds

  • Authors:
  • Fabrizio Marozzo;Domenico Talia;Paolo Trunfio

  • Affiliations:
  • University of Calabria, Italy;University of Calabria, Italy;University of Calabria, Italy

  • Venue:
  • WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data analysis workflows are often composed by many concurrent and compute-intensive tasks that can be efficiently executed only on scalable computing infrastructures, such as HPC systems, Grids and Cloud platforms. The use of Cloud services for the scalable execution of data analysis workflows is the key feature of the Data Mining Cloud Framework (DMCF), which provides a Web interface to develop data analysis applications using a visual workflow formalism. In this paper we describe how we extended DMCF to support also the design and execution of script-based data analysis workflows on Clouds. We introduce a workflow language, named JS4Cloud, that extends JavaScript to support the implementation of Cloud-based data analysis tasks and the handling of data on the Cloud. We also describe how data analysis workflows programmed through JS4Cloud are processed by DMCF to make parallelism explicit and to enable their scalable execution on Clouds. Finally, we present a data analysis application developed with JS4Cloud, and the performance results obtained executing the application with DMCF on the Windows Azure platform.