Comparing Hadoop and Fat-Btree based access method for small file I/O applications
WAIM'10 Proceedings of the 11th international conference on Web-age information management
MapReducing a genomic sequencing workflow
Proceedings of the second international workshop on MapReduce and its applications
A framework for readapting and running bioinformatics applications in the cloud
Proceedings of the 2012 ACM Research in Applied Computation Symposium
Methodological Review: 'Big data', Hadoop and cloud computing in genomics
Journal of Biomedical Informatics
Hi-index | 0.00 |
Bioinformatics applications currently require both processing of huge amounts of data and heavy computation. Fulfilling these requirements calls for simple ways to implement parallel computing. MapReduce is a general-purpose parallelization model that seems particularly well-suited to this task and for which an open source implementation (Hadoop) is available. Here we report on its application to three relevant algorithms: BLAST, GSEA and GRAMMAR. The first is characterized by relatively low-weight computation on large data sets, while the second requires heavy processing of relatively small data sets. The third one can be considered as containing a mixture of these two computational flavors. Our results are encouraging and indicate that the framework could have a wide range of bioinformatics applications while maintaining good computational efficiency, scalability and ease of maintenance.