Speeding-up codon analysis on the cloud with local MapReduce aggregation

Authors:
Atanas Radenski;Louis Ehwerhemuepha
Affiliations:
-;-
Venue:
Information Sciences: an International Journal
Year:
2014

Citing 32
Cited 1

Segmentation of DNA into coding and noncoding regions based on recursive entropic segmentation and stop-codon statistics

EURASIP Journal on Applied Signal Processing
Google's MapReduce programming model – Revisited

Science of Computer Programming
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
CloudBurst

Bioinformatics
Many-core algorithms for statistical phylogenetics

Bioinformatics
MapReduce optimization using regulated dynamic prioritization

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Building a high-level dataflow system on top of Map-Reduce: the Pig experience

Proceedings of the VLDB Endowment
Towards automatic optimization of MapReduce programs

Proceedings of the 1st ACM symposium on Cloud computing
Design patterns for efficient graph algorithms in MapReduce

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
AzureBlast: a case study of developing science applications on the cloud

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
See spot run: using spot instances for mapreduce workflows

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Bio-sequence database scanning on a GPU

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Automatic optimization for MapReduce programs

Proceedings of the VLDB Endowment
On scheduling in map-reduce and flow-shops

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
SystemML: Declarative machine learning on MapReduce

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Parallel data processing with MapReduce: a survey

ACM SIGMOD Record
More convenient more overhead: the performance evaluation of Hadoop streaming

Proceedings of the 2011 ACM Symposium on Research in Applied Computation
A parallel method for computing rough set approximations

Information Sciences: an International Journal
iMapReduce: A Distributed Computing Framework for Iterative Computation

Journal of Grid Computing
A Parallel Genetic Algorithm Based on Hadoop MapReduce for the Automatic Generation of JUnit Test Suites

ICST '12 Proceedings of the 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation
Large-Scale DNA sequence analysis in the cloud: a stream-based approach

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Distributed simulated annealing with mapreduce

EvoApplications'12 Proceedings of the 2012t European conference on Applications of Evolutionary Computation
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
Approximate probabilistic analysis of biopathway dynamics

Bioinformatics
M3R: increased performance for in-memory Hadoop jobs

Proceedings of the VLDB Endowment
Distilling GeneChips with GP on the emerald GPU supercomputer

ACM SIGEVOlution
Mastering Cloud Computing: Foundations and Applications Programming

Mastering Cloud Computing: Foundations and Applications Programming
Time-based proxy re-encryption scheme for secure data sharing in a cloud environment

Information Sciences: an International Journal
Security and privacy for storage and computation in cloud computing

Information Sciences: an International Journal
Efficient and robust large medical image retrieval in mobile cloud computing environment

Information Sciences: an International Journal
Speeding-up codon analysis on the cloud with local MapReduce aggregation

Information Sciences: an International Journal

Speeding-up codon analysis on the cloud with local MapReduce aggregation

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

A notable obstacle to higher performance of data-intensive Hadoop MapReduce (MR) bioinformatics algorithms is the large volume of intermediate data that need to be sorted, shuffled, and transmitted between mapper and reducer tasks. This difficulty manifests itself quite clearly in MR codon analysis which is known to generate voluminous intermediate data that create a bottleneck in basic MR codon analysis algorithms. Our proposed approach to handle the intermediate data bottleneck is local in-mapper aggregation (or simply local aggregation), a technique that helps reduce the intermediate data volume between mapper and reducer tasks in MR. We experimentally evaluate the performance of local aggregation (i) by developing codon analysis MR algorithms with and without local aggregation and (ii) by experimentally measuring their performance on Amazon Web Services (AWS), the Amazon cloud platform. Codon analysis with local aggregation maintains consistently high performance with the growth of larger datasets while basic codon analysis, without local aggregation becomes impractically slow even for smaller datasets. Our results can be beneficial (i) to members of the bioinformatics community who need to perform fast and cost-effective nucleotide MR analysis on the cloud and (ii) to computer scientists who strive to increase the performance of MR algorithms.