Fast text searching: allowing errors
Communications of the ACM
The grid: blueprint for a new computing infrastructure
The grid: blueprint for a new computing infrastructure
Kepler: An Extensible System for Design and Execution of Scientific Workflows
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
VisTrails: visualization meets data management
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Proceedings of the 5th IEEE workshop on Challenges of large applications in distributed environments
Provenance for Computational Tasks: A Survey
Computing in Science and Engineering
A break in the clouds: towards a cloud definition
ACM SIGCOMM Computer Communication Review
CLOUD '10 Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing
Data parallelism in bioinformatics workflows using Hydra
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Basics of Software Engineering Experimentation
Basics of Software Engineering Experimentation
SciPhy: a cloud-based workflow for phylogenetic analysis of drug targets in protozoan genomes
BSB'11 Proceedings of the 6th Brazilian conference on Advances in bioinformatics and computational biology
A Performance Evaluation of X-Ray Crystallography Scientific Workflow Using SciCumulus
CLOUD '11 Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing
Towards a Cost Model for Scheduling Scientific Workflows Activities in Cloud Environments
SERVICES '11 Proceedings of the 2011 IEEE World Congress on Services
Optimizing Phylogenetic Analysis Using SciHmm Cloud-based Scientific Workflow
ESCIENCE '11 Proceedings of the 2011 IEEE Seventh International Conference on eScience
An adaptive parallel execution strategy for cloud-based scientific workflows
Concurrency and Computation: Practice & Experience
Hi-index | 0.00 |
Over the last years, comparative genomics analyses have become more compute-intensive due to the explosive number of available genome sequences. Comparative genomics analysis is an important a prioristep for experiments in various bioinformatics domains. This analysis can be used to enhance the performance and quality of experiments in areas such as evolution and phylogeny. A common phylogenetic analysis makes extensive use of Multiple Sequence Alignment (MSA) in the construction of phylogenetic trees, which are used to infer evolutionary relationships between homologous genes. Each phylogenetic analysis aims at exploring several different MSA methods to verify which execution produces trees with the best quality. This phylogenetic exploration may run during weeks, even when executed in High Performance Computing (HPC) environments. Although there are many approaches that model and parallelize phylogenetic analysis as scientific workflows, exploring all MSA methods becomes a complex and expensive task to be performed. If scientists determine a priorithe most adequate MSA method to use in the phylogenetic analysis, it would save time, and, in some cases, financial resources. Comparative genomics analyses play an important role in optimizing phylogenetic analysis workflows. In this paper, we extend the SciHmm scientific workflow, aimed at determining the most suitable MSA method, to use it in a phylogenetic analysis. SciHmm uses SciCumulus, a cloud workflow execution engine, for parallel execution. Experimental results show that using SciHmm considerably reduces the total execution time of the phylogenetic analysis (up to 80%). Experiments also show that trees built with the MSA program elected by using SciHmm presented more quality than the remaining, as expected. In addition, the parallel execution of SciHmm shows that this kind of bioinformatics workflow has an excellent cost/benefit when executed in cloud environments.