Component software: beyond object-oriented programming
Component software: beyond object-oriented programming
REP - ChaRacterizing and Exploiting Process Components: Results of Experimentation
WCRE '98 Proceedings of the Working Conference on Reverse Engineering (WCRE'98)
The Grid 2: Blueprint for a New Computing Infrastructure
The Grid 2: Blueprint for a New Computing Infrastructure
Kepler: An Extensible System for Design and Execution of Scientific Workflows
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Scheduling of scientific workflows in the ASKALON grid environment
ACM SIGMOD Record
Exploring Williams--Beuren syndrome using myGrid
Bioinformatics
Physical and Virtual Partitioning in OLAP Database Clusters
SBAC-PAD '05 Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing
VisTrails: visualization meets data management
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Proceedings of the 5th IEEE workshop on Challenges of large applications in distributed environments
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Semantics-based distributed I/O for mpiBLAST
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Falkon: a Fast and Light-weight tasK executiON framework
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
OrthoSearch: a scientific workflow approach to detect distant homologies on protozoans
Proceedings of the 2008 ACM symposium on Applied computing
Provenance for Computational Tasks: A Survey
Computing in Science and Engineering
Nimrod/K: towards massively parallel dynamic grid workflows
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Service-Oriented Architecture for VIEW: A Visual Scientific Workflow Management System
SCC '08 Proceedings of the 2008 IEEE International Conference on Services Computing - Volume 1
G-BLAST: a Grid-based solution for mpiBLAST on computational Grids
Concurrency and Computation: Practice & Experience
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
A MapReduce-Enabled Scientific Workflow Composition Framework
ICWS '09 Proceedings of the 2009 IEEE International Conference on Web Services
GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
A Task Abstraction and Mapping Approach to the Shimming Problem in Scientific Workflows
SCC '09 Proceedings of the 2009 IEEE International Conference on Services Computing
Exploring many task computing in scientific workflows
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
An opportunistic algorithm for scheduling workflows on grids
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
BSB'11 Proceedings of the 6th Brazilian conference on Advances in bioinformatics and computational biology
A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds
Journal of Grid Computing
A framework for readapting and running bioinformatics applications in the cloud
Proceedings of the 2012 ACM Research in Applied Computation Symposium
Performance evaluation of parallel strategies in public clouds: A study with phylogenomic workflows
Future Generation Computer Systems
Designing a parallel cloud based comparative genomics workflow to improve phylogenetic analyses
Future Generation Computer Systems
Hi-index | 0.00 |
Large scale bioinformatics experiments are usually composed by a set of data flows generated by a chain of activities (programs or services) that may be modeled as scientific workflows. Current Scientific Workflow Management Systems (SWfMS) are used to orchestrate these workflows to control and monitor the whole execution. It is very common in bioinformatics experiments to process very large datasets. In this way, data parallelism is a common approach used to increase performance and reduce overall execution time. However, most of current SWfMS still lack on supporting parallel executions in high performance computing (HPC) environments. Additionally keeping track of provenance data in distributed environments is still an open, yet important problem. Recently, Hydra middleware was proposed to bridge the gap between the SWfMS and the HPC environment, by providing a transparent way for scientists to parallelize workflow executions while capturing distributed provenance. This paper analyzes data parallelism scenarios in bioinformatics domain and presents an extension to Hydra middleware through a specific cartridge that promotes data parallelism in bioinformatics workflows. Experimental results using workflows with BLAST show performance gains with the additional benefits of distributed provenance support.