EURASIP Journal on Applied Signal Processing
Google's MapReduce programming model – Revisited
Science of Computer Programming
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Bioinformatics
Many-core algorithms for statistical phylogenetics
Bioinformatics
MapReduce optimization using regulated dynamic prioritization
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Building a high-level dataflow system on top of Map-Reduce: the Pig experience
Proceedings of the VLDB Endowment
Towards automatic optimization of MapReduce programs
Proceedings of the 1st ACM symposium on Cloud computing
Design patterns for efficient graph algorithms in MapReduce
Proceedings of the Eighth Workshop on Mining and Learning with Graphs
AzureBlast: a case study of developing science applications on the cloud
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
See spot run: using spot instances for mapreduce workflows
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Bio-sequence database scanning on a GPU
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Automatic optimization for MapReduce programs
Proceedings of the VLDB Endowment
On scheduling in map-reduce and flow-shops
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
SystemML: Declarative machine learning on MapReduce
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Parallel data processing with MapReduce: a survey
ACM SIGMOD Record
More convenient more overhead: the performance evaluation of Hadoop streaming
Proceedings of the 2011 ACM Symposium on Research in Applied Computation
A parallel method for computing rough set approximations
Information Sciences: an International Journal
iMapReduce: A Distributed Computing Framework for Iterative Computation
Journal of Grid Computing
ICST '12 Proceedings of the 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation
Large-Scale DNA sequence analysis in the cloud: a stream-based approach
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Distributed simulated annealing with mapreduce
EvoApplications'12 Proceedings of the 2012t European conference on Applications of Evolutionary Computation
Hadoop: The Definitive Guide
Approximate probabilistic analysis of biopathway dynamics
Bioinformatics
M3R: increased performance for in-memory Hadoop jobs
Proceedings of the VLDB Endowment
Distilling GeneChips with GP on the emerald GPU supercomputer
ACM SIGEVOlution
Mastering Cloud Computing: Foundations and Applications Programming
Mastering Cloud Computing: Foundations and Applications Programming
Time-based proxy re-encryption scheme for secure data sharing in a cloud environment
Information Sciences: an International Journal
Security and privacy for storage and computation in cloud computing
Information Sciences: an International Journal
Efficient and robust large medical image retrieval in mobile cloud computing environment
Information Sciences: an International Journal
Speeding-up codon analysis on the cloud with local MapReduce aggregation
Information Sciences: an International Journal
Speeding-up codon analysis on the cloud with local MapReduce aggregation
Information Sciences: an International Journal
Hi-index | 0.07 |
A notable obstacle to higher performance of data-intensive Hadoop MapReduce (MR) bioinformatics algorithms is the large volume of intermediate data that need to be sorted, shuffled, and transmitted between mapper and reducer tasks. This difficulty manifests itself quite clearly in MR codon analysis which is known to generate voluminous intermediate data that create a bottleneck in basic MR codon analysis algorithms. Our proposed approach to handle the intermediate data bottleneck is local in-mapper aggregation (or simply local aggregation), a technique that helps reduce the intermediate data volume between mapper and reducer tasks in MR. We experimentally evaluate the performance of local aggregation (i) by developing codon analysis MR algorithms with and without local aggregation and (ii) by experimentally measuring their performance on Amazon Web Services (AWS), the Amazon cloud platform. Codon analysis with local aggregation maintains consistently high performance with the growth of larger datasets while basic codon analysis, without local aggregation becomes impractically slow even for smaller datasets. Our results can be beneficial (i) to members of the bioinformatics community who need to perform fast and cost-effective nucleotide MR analysis on the cloud and (ii) to computer scientists who strive to increase the performance of MR algorithms.