Biodoop: Bioinformatics on Hadoop

Authors:
Simone Leo;Federico Santoni;Gianluigi Zanetti
Affiliations:
-;-;-
Venue:
ICPPW '09 Proceedings of the 2009 International Conference on Parallel Processing Workshops
Year:
2009

Citing 0
Cited 4

Comparing Hadoop and Fat-Btree based access method for small file I/O applications

WAIM'10 Proceedings of the 11th international conference on Web-age information management
MapReducing a genomic sequencing workflow

Proceedings of the second international workshop on MapReduce and its applications
A framework for readapting and running bioinformatics applications in the cloud

Proceedings of the 2012 ACM Research in Applied Computation Symposium
Methodological Review: 'Big data', Hadoop and cloud computing in genomics

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bioinformatics applications currently require both processing of huge amounts of data and heavy computation. Fulfilling these requirements calls for simple ways to implement parallel computing. MapReduce is a general-purpose parallelization model that seems particularly well-suited to this task and for which an open source implementation (Hadoop) is available. Here we report on its application to three relevant algorithms: BLAST, GSEA and GRAMMAR. The first is characterized by relatively low-weight computation on large data sets, while the second requires heavy processing of relatively small data sets. The third one can be considered as containing a mixture of these two computational flavors. Our results are encouraging and indicate that the framework could have a wide range of bioinformatics applications while maintaining good computational efficiency, scalability and ease of maintenance.