A data placement strategy in scientific cloud workflows
Future Generation Computer Systems
MOON: MapReduce On Opportunistic eNvironments
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
MRAP: a novel MapReduce-based framework to support HPC analytics applications with access patterns
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
AzureBlast: a case study of developing science applications on the cloud
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Data parallelism in bioinformatics workflows using Hydra
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Variable-sized map and locality-aware reduce on public-resource grids
Future Generation Computer Systems
Proceedings of the second international workshop on Emerging computational methods for the life sciences
MapReducing a genomic sequencing workflow
Proceedings of the second international workshop on MapReduce and its applications
Parallel data processing with MapReduce: a survey
ACM SIGMOD Record
Enabling e-science applications on the cloud with COMPSs
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
SNP genotype calling with MapReduce
Proceedings of third international workshop on MapReduce and its Applications Date
Distributed approximate spectral clustering for large-scale datasets
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Reliable MapReduce computing on opportunistic resources
Cluster Computing
Design and analysis of data management in scalable parallel scripting
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A framework for readapting and running bioinformatics applications in the cloud
Proceedings of the 2012 ACM Research in Applied Computation Symposium
Cloud MapReduce for Monte Carlo bootstrap applied to Metabolic Flux Analysis
Future Generation Computer Systems
A virtual machine consolidation framework for MapReduce enabled computing clouds
Proceedings of the 24th International Teletraffic Congress
Adapting MPI to MapReduce PaaS Clouds: An Experiment in Cross-Paradigm Execution
UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Dimensioning the virtual cluster for parallel scientific workflows in clouds
Proceedings of the 4th ACM workshop on Scientific cloud computing
Performance evaluation of parallel strategies in public clouds: A study with phylogenomic workflows
Future Generation Computer Systems
An adaptive data transfer algorithm using block device reconfiguration in virtual MapReduce clusters
Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
Parallelizing the execution of sequential scripts
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Development of a virtualized supercomputing environment for genomic analysis
The Journal of Supercomputing
Methodological Review: 'Big data', Hadoop and cloud computing in genomics
Journal of Biomedical Informatics
SDAFT: a novel scalable data access framework for parallel BLAST
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
A Study on Linear Elastic FEM by Cloud Computing
Proceedings of the Second International Conference on Innovative Computing and Cloud Computing
A MapReduce task scheduling algorithm for deadline constraints
Cluster Computing
International Journal of Parallel Programming
Journal of High Speed Networks
Hi-index | 0.00 |
This paper proposes and evaluates an approach to the parallelization, deployment and management of bioinformatics applications that integrates several emerging technologies for distributed computing. The proposed approach uses the MapReduce paradigm to parallelize tools and manage their execution, machine virtualization to encapsulate their execution environments and commonly used data sets into flexibly deployable virtual machines, and network virtualization to connect resources behind firewalls/NATs while preserving the necessary performance and the communication environment. An implementation of this approach is described and used to demonstrate and evaluate the proposed approach. The implementation integrates Hadoop, Virtual Workspaces, and ViNe as the MapReduce, virtual machine and virtual network technologies, respectively, to deploy the commonly used bioinformatics tool NCBI BLAST on a WAN-based test bed consisting of clusters at two distinct locations, the University of Florida and the University of Chicago. This WAN-based implementation, called CloudBLAST, was evaluated against both non-virtualized and LAN-based implementations in order to assess the overheads of machine and network virtualization, which were shown to be insignificant. To compare the proposed approach against an MPI-based solution, CloudBLAST performance was experimentally contrasted against the publicly available mpiBLAST on the same WAN-based test bed. Both versions demonstrated performance gains as the number of available processors increased, with CloudBLAST delivering speedups of 57 against 52.4 of MPI version, when 64 processors on 2 sites were used. The results encourage the use of the proposed approach for the execution of large-scale bioinformatics applications on emerging distributed environments that provide access to computing resources as a service.