Indel seeds for homology search

Authors:
Denise Mak;Yevgeniy Gelfand;Gary Benson
Affiliations:
-;-;-
Venue:
Bioinformatics
Year:
2006

Citing 0
Cited 6

Hardness of optimal spaced seed design

Journal of Computer and System Sciences
Computing Alignment Seed Sensitivity with Probabilistic Arithmetic Automata

WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
On Subset Seeds for Protein Alignment

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Subset seed automaton

CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Protein similarity search with subset seeds on a dedicated reconfigurable hardware

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Probabilistic Arithmetic Automata and Their Applications

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	3.84

Visualization

Abstract

We are interested in detecting homologous genomic DNA sequences with the goal of locating approximate inverted, interspersed, and tandem repeats. Standard search techniques start by detecting small matching parts, called seeds, between a query sequence and database sequences. Contiguous seed models have existed for many years. Recently, spaced seeds were shown to be more sensitive than contiguous seeds without increasing the random hit rate. To determine the superiority of one seed model over another, a model of homologous sequence alignment must be chosen. Previous studies evaluating spaced and contiguous seeds have assumed that matches and mismatches occur within these alignments, but not insertions and deletions (indels). This is perhaps appropriate when searching for protein coding sequences (indel seeds, which explicitly allows indels. We present a waiting time formula for computing the sensitivity of an indel seed and show that indel seeds significantly outperform contiguous and spaced seeds when homologies include indels. We discuss the practical aspect of using indel seeds and finally we present results from a search for inverted repeats in the dog genome using both indel and spaced seeds. Contact: dyfmak@bu.edu