Using Bloom Filters for Large Scale Gene Sequence Analysis in Haskell

  • Authors:
  • Ketil Malde;Bryan O'Sullivan

  • Affiliations:
  • Institute of Marine Research, Bergen, Norway;Serpentine Green Design, San Francisco, USA

  • Venue:
  • PADL '09 Proceedings of the 11th International Symposium on Practical Aspects of Declarative Languages
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Analysis of biological data often involves large data sets and computationally expensive algorithms. Databases of biological data continue to grow, leading to an increasing demand for improved algorithms and data structures. Despite having many advantages over more traditional indexing structures, the Bloom filter is almost unused in bioinformatics. Here we present a robust and efficient Bloom filter implementation in Haskell, and implement a simple bioinformatics application for indexing and matching sequence data. We use this to index the chromosomes that make up the human genome, and map all available gene sequences to it. Our experiences with developing and tuning our application suggest that for bioinformatics applications, Haskell offers a compelling combination of rapid development, quality assurance, and high performance.