Good spaced seeds for homology search
Bioinformatics
Tracking repeats using significance and transitivity
Bioinformatics
Statistics of local multiple alignments
Bioinformatics
AuberGene---a sensitive genome alignment tool
Bioinformatics
HMMoC—a compiler for hidden Markov models
Bioinformatics
Procrastination leads to efficient filtration for local multiple alignment
WABI'06 Proceedings of the 6th international conference on Algorithms in Bioinformatics
Hi-index | 0.00 |
The identification of homologous DNA is a fundamental buildingblock of comparative genomic and molecular evolution studies. To date, pairwiselocal sequence alignment methods have been the prevailing technique to identifyhomologous nucleotides. However, existing methods that identify and align allhomologous nucleotides in one or more genomes have suffered poor scalabilityand limited accuracy.We propose a novel method that couples a gapped extensionheuristic with a previously described efficient filtration method for local multiplealignment. During gapped extension, we use the MUSCLE implementation ofprogressive multiple alignment with iterative refinement. The resulting gappedextensions potentially contain alignments of unrelated sequence. We detectand remove such undesirable alignments using a hidden Markov model topredict the posterior probability of homology. The HMM emission frequenciesfor nucleotide substitutions can be derived from any strand/species-symmetric nucleotide substitution matrix, and we have developed a method to adapt anarbitrary substitution matrix (i.e. HOXD) to organisms with different G+Ccontent. We evaluate the performance of our method and previous approacheson a hybrid dataset of real genomic DNA with simulated interspersed repeats.Our method outperforms existing methods in terms of sensitivity, positivepredictive value, and localizing boundaries of homology. The described methodshave been implemented in the free, open-source procrastAligner software,available from: http://alggen.lsi.upc.es/recerca/align/procrastination.