Protein similarity search with subset seeds on a dedicated reconfigurable hardware

Authors:
Pierre Peterlongo;Laurent Noé;Dominique Lavenier;Gilles Georges;Julien Jacques;Gregory Kucherov;Mathieu Giraud
Affiliations:
Symbiose, IRISA, INRIA, CNRS, Université Rennes 1;Sequoia/Bioinfo, LIFL, INRIA, CNRS, Université Lille 1;Symbiose, IRISA, INRIA, CNRS, Université Rennes 1;Symbiose, IRISA, INRIA, CNRS, Université Rennes 1;Symbiose, IRISA, INRIA, CNRS, Université Rennes 1;Sequoia/Bioinfo, LIFL, INRIA, CNRS, Université Lille 1;Sequoia/Bioinfo, LIFL, INRIA, CNRS, Université Lille 1
Venue:
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Year:
2007

Citing 13
Cited 0

A scalable systolic multiprocessor system for analysis of biological sequences

Proceedings of the 1993 symposium on Research on integrated systems
A sub-quadratic sequence alignment algorithm for unrestricted cost matrices

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
A New Hardware Architecture for Genomic and Proteomic Sequence Alignment

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Optimizing Multiple Seeds for Protein Homology Search

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Designing seeds for similarity search in genomic DNA

Journal of Computer and System Sciences - Special issue on bioinformatics II
Vector seeds: An extension to spaced seeds

Journal of Computer and System Sciences - Special issue on bioinformatics II
tPatternHunter: gapped, fast and sensitive translated homology search

Bioinformatics
Superiority and complexity of the spaced seeds

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Indel seeds for homology search

Bioinformatics
Striped Smith--Waterman speeds database searches six times over other SIMD implementations

Bioinformatics
Families of FPGA-based accelerators for approximate string matching

Microprocessors & Microsystems
Parallel genomic sequence-search on a massively parallel system

Proceedings of the 4th international conference on Computing frontiers
Rapid homology search with two-stage extension and daughter seeds

COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics

Quantified Score

Hi-index	0.00

Visualization

Abstract

With a sharp increase of available DNA and protein sequence data, new precise and fast similarity search methods are needed for large-scale genome and proteome comparisons. Modern seed-based techniques of similarity search (spaced seeds, multiple seeds, subset seeds) provide a better sensitivity/specificity ratio. We present an implementation of such a seed-based technique on a parallel specialized hardware embedding reconfigurable architecture (FPGA), where the FPGA is tightly connected to large capacity Flash memories. This parallel system allows large databases to be fully indexed and rapidly accessed. Compared to traditional approaches presented by the Blastp software, we obtain both a significant speed-up and better results. To the best of our knowledge, this is the first attempt to exploit efficient seed-based algorithms for parallelizing the sequence similarity search.