Designing seeds for similarity search in genomic DNA
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
An n log n algorithm for minimizing states in a finite automaton
An n log n algorithm for minimizing states in a finite automaton
Sensitivity analysis and efficient method for identifying optimal spaced seeds
Journal of Computer and System Sciences
Optimizing Multiple Seeds for Protein Homology Search
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Vector seeds: An extension to spaced seeds
Journal of Computer and System Sciences - Special issue on bioinformatics II
Good spaced seeds for homology search
Bioinformatics
Superiority and complexity of the spaced seeds
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Indel seeds for homology search
Bioinformatics
Multiple spaced seeds for homology search
Bioinformatics
Probabilistic Arithmetic Automata and Their Application to Pattern Matching Statistics
CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
Alignment seeding strategies using contiguous pyrimidine purine matches
Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Probabilistic Arithmetic Automata and Their Applications
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
Heuristic sequence alignment and database search algorithms, such as PatternHunter and BLAST, are based on the initial discovery of so-called alignment seedsof well-conserved alignment patterns, which are subsequently extended to full local alignments. In recent years, the theory of classical seeds (matching contiguous q-grams) has been extended to spaced seeds, which allow mismatches within a seed, and subsequently to indel seeds, which allow gaps in the underlying alignment.Different seeds within a given class of seeds are usually compared by their sensitivity, that is, the probability to match an alignment generated from a particular probabilistic alignment model.We present a flexible, exact, unifying framework called probabilistic arithmetic automatonfor seed sensitivity computation that includes all previous results on spaced and indel seeds. In addition, we can easily incorporate sets of arbitrary seeds. Instead of only computing the probability of at least one hit (the standard definition of sensitivity), we can optionally provide the entire distribution of overlapping or non-overlapping seed hits, which yields a different characterization of a seed. A symbolic representation allows fast computation for any set of parameters.