Biodoop: Bioinformatics on Hadoop

  • Authors:
  • Simone Leo;Federico Santoni;Gianluigi Zanetti

  • Affiliations:
  • -;-;-

  • Venue:
  • ICPPW '09 Proceedings of the 2009 International Conference on Parallel Processing Workshops
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Bioinformatics applications currently require both processing of huge amounts of data and heavy computation. Fulfilling these requirements calls for simple ways to implement parallel computing. MapReduce is a general-purpose parallelization model that seems particularly well-suited to this task and for which an open source implementation (Hadoop) is available. Here we report on its application to three relevant algorithms: BLAST, GSEA and GRAMMAR. The first is characterized by relatively low-weight computation on large data sets, while the second requires heavy processing of relatively small data sets. The third one can be considered as containing a mixture of these two computational flavors. Our results are encouraging and indicate that the framework could have a wide range of bioinformatics applications while maintaining good computational efficiency, scalability and ease of maintenance.