A run-time system for efficient execution of scientific workflows on distributed environments

Authors:
George Teodoro;Tullo Tavares;Renato Ferreira;Tahsin Kurc;Wagner Meira;Dorgival Guedes;Tony Pan;Joel Saltz
Affiliations:
Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil;Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil;Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil;Department of Biomedical Informatics, The Ohio State University, Columbus, OH;Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil;Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil;Department of Biomedical Informatics, The Ohio State University, Columbus, OH;Department of Biomedical Informatics, The Ohio State University, Columbus, OH
Venue:
International Journal of Parallel Programming
Year:
2008

Citing 8
Cited 0

Active disks: programming model, algorithms and evaluation

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Grid Datafarm Architecture for Petascale Data Intensive Computing

CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
Condor-G: A Computation Management Agent for Multi-Institutional Grids

HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
Kepler: An Extensible System for Design and Execution of Scientific Workflows

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
XML database support for distributed execution of data-intensive scientific workflows

ACM SIGMOD Record
Anthill: A Scalable Run-Time Environment for Data Mining Applications

SBAC-PAD '05 Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing
Scientific workflow management and the Kepler system: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scientific workflow systems have been introduced in response to the demand of researchers from several domains of science who need to process and analyze increasingly larger datasets. The design of these systems is largely based on the observation that data analysis applications can be composed as pipelines or networks of computations on data. In this work, we present a run-time support system that is designed to facilitate this type of computation in distributed computing environments. Our system is optimized for data-intensive workflows, in which efficient management and retrieval of data, coordination of data processing and data movement, and check-pointing of intermediate results are critical and challenging issues. Experimental evaluation of our system shows that linear speedups can be achieved for sophisticated applications, which are implemented as a network of multiple data processing components.