Fast and Sensitive Alignment of Large Genomic Sequences
CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Good spaced seeds for homology search
Bioinformatics
Tracking repeats using significance and transitivity
Bioinformatics
PILER: identification and classification of genomic repeats
Bioinformatics
De novo identification of repeat families in large genomes
Bioinformatics
Statistics of local multiple alignments
Bioinformatics
Superiority and complexity of the spaced seeds
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
AuberGene---a sensitive genome alignment tool
Bioinformatics
HMMoC—a compiler for hidden Markov models
Bioinformatics
Procrastination leads to efficient filtration for local multiple alignment
WABI'06 Proceedings of the 6th international conference on Algorithms in Bioinformatics
Hi-index | 0.00 |
Pairwise local sequence alignment methods have been the prevailing technique to identify homologous nucleotides between related species. However, existing methods that identify and align all homologous nucleotides in one or more genomes have suffered from poor scalability and limited accuracy. We propose a novel method that couples a gapped extension heuristic with an efficient filtration method for identifying interspersed repeats in genome sequences. During gapped extension, we use the MUSCLE implementation of progressive global multiple alignment with iterative refinement. The resulting gapped extensions potentially contain alignments of unrelated sequence. We detect and remove such undesirable alignments using a hidden Markov model (HMM) to predict the posterior probability of homology. The HMM emission frequencies for nucleotide substitutions can be derived from any time-reversible nucleotide substitution matrix. We evaluate the performance of our method and previous approaches on a hybrid data set of real genomic DNA with simulated interspersed repeats. Our method outperforms a related method in terms of sensitivity, positive predictive value, and localizing boundaries of homology. The described methods have been implemented in freely available software, Repeatoire, available from: http://wwwabi.snv.jussieu.fr/public/Repeatoire.