Why and Where: A Characterization of Data Provenance
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Kepler: An Extensible System for Design and Execution of Scientific Workflows
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
VisTrails: visualization meets data management
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Proceedings of the 5th IEEE workshop on Challenges of large applications in distributed environments
Introducing secure provenance: problems and challenges
Proceedings of the 2007 ACM workshop on Storage security and survivability
Clustal W and Clustal X version 2.0
Bioinformatics
Falkon: a Fast and Light-weight tasK executiON framework
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Provenance and scientific workflows: challenges and opportunities
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Provenance for Computational Tasks: A Survey
Computing in Science and Engineering
Toward loosely coupled programming on petascale systems
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
A break in the clouds: towards a cloud definition
ACM SIGCOMM Computer Communication Review
Workflows and e-Science: An overview of workflow system features and capabilities
Future Generation Computer Systems
Optimizing user views for workflows
Proceedings of the 12th International Conference on Database Theory
Querying and Managing Provenance through User Views in Scientific Workflows
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A MapReduce-Enabled Scientific Workflow Composition Framework
ICWS '09 Proceedings of the 2009 IEEE International Conference on Web Services
MapReduce: a flexible data processing tool
Communications of the ACM - Amir Pnueli: Ahead of His Time
Exploring many task computing in scientific workflows
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
CLOUD '10 Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing
Case study for running HPC applications in public clouds
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Data parallelism in bioinformatics workflows using Hydra
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Generating sound workflow views for correct provenance analysis
ACM Transactions on Database Systems (TODS)
Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Hybrid Computing-Where HPC meets grid and Cloud Computing
Future Generation Computer Systems
SciPhy: a cloud-based workflow for phylogenetic analysis of drug targets in protozoan genomes
BSB'11 Proceedings of the 6th Brazilian conference on Advances in bioinformatics and computational biology
More convenient more overhead: the performance evaluation of Hadoop streaming
Proceedings of the 2011 ACM Symposium on Research in Applied Computation
Adapting scientific computing problems to clouds using MapReduce
Future Generation Computer Systems
An adaptive parallel execution strategy for cloud-based scientific workflows
Concurrency and Computation: Practice & Experience
A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds
Journal of Grid Computing
The characteristics and performance of groups of jobs in grids
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
High performance cloud computing
Future Generation Computer Systems
Hi-index | 0.00 |
Data analysis is an exploratory process that demands high performance computing (HPC). SciPhylomics, for example, is a data-intensive workflow that aims at producing phylogenomic trees based on an input set of protein sequences of genomes to infer evolutionary relationships among living organisms. SciPhylomics can benefit from parallel processing techniques provided by existing approaches such as SciCumulus cloud workflow engine and MapReduce implementations such as Hadoop. Despite some performance fluctuations, computing clouds provide a new dimension for HPC due to its elasticity and availability features. In this paper, we present a performance evaluation for SciPhylomics executions in a real cloud environment. The workflow was executed using two parallel execution approaches (SciCumulus and Hadoop) at the Amazon EC2 cloud. Our results reinforce the benefits of parallelizing data for the phylogenomic inference workflow using MapReduce-like parallel approaches in the cloud. The performance results demonstrate that this class of bioinformatics experiment is suitable to be executed in the cloud despite its need for high performance capabilities. The evaluated workflow shares many features of several data intensive workflows, which present first insights that these cloud execution results can be extrapolated to other classes of experiments.