Using Bloom Filters for Large Scale Gene Sequence Analysis in Haskell

Authors:
Ketil Malde;Bryan O'Sullivan
Affiliations:
Institute of Marine Research, Bergen, Norway;Serpentine Green Design, San Francisco, USA
Venue:
PADL '09 Proceedings of the 11th International Symposium on Practical Aspects of Declarative Languages
Year:
2009

Citing 11
Cited 2

Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Lazy functional state threads

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
QuickCheck: a lightweight tool for random testing of Haskell programs

ICFP '00 Proceedings of the fifth ACM SIGPLAN international conference on Functional programming
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Optimal Exact Strring Matching Based on Suffix Arrays

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Replacing suffix trees with enhanced suffix arrays

Journal of Discrete Algorithms - SPIRE 2002
RBR: library-less repeat detection for ESTs

Bioinformatics
Less hashing, same performance: building a better bloom filter

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Rewriting haskell strings

PADL'07 Proceedings of the 9th international conference on Practical Aspects of Declarative Languages
Space and time efficient parallel algorithms and software for EST clustering

IEEE Transactions on Parallel and Distributed Systems

Don't thrash: how to cache your hash on flash

HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
Don't thrash: how to cache your hash on flash

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Analysis of biological data often involves large data sets and computationally expensive algorithms. Databases of biological data continue to grow, leading to an increasing demand for improved algorithms and data structures. Despite having many advantages over more traditional indexing structures, the Bloom filter is almost unused in bioinformatics. Here we present a robust and efficient Bloom filter implementation in Haskell, and implement a simple bioinformatics application for indexing and matching sequence data. We use this to index the chromosomes that make up the human genome, and map all available gene sequences to it. Our experiences with developing and tuning our application suggest that for bioinformatics applications, Haskell offers a compelling combination of rapid development, quality assurance, and high performance.