Designing seeds for similarity search in genomic DNA
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
On spaced seeds for similarity search
Discrete Applied Mathematics
Optimizing Multiple Seeds for Protein Homology Search
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Efficient human motion retrieval in large databases
Proceedings of the 4th international conference on Computer graphics and interactive techniques in Australasia and Southeast Asia
Optimal spaced seeds for faster approximate string matching
Journal of Computer and System Sciences
Computing Alignment Seed Sensitivity with Probabilistic Arithmetic Automata
WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Amino Acid Classification and Hash Seeds for Homology Search
BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Pairwise sequence alignment algorithms: a survey
Proceedings of the 2009 conference on Information Science, Technology and Applications
On Subset Seeds for Protein Alignment
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Protein similarity search with subset seeds on a dedicated reconfigurable hardware
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Rapid homology search with two-stage extension and daughter seeds
COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
Efficient motion search in large motion capture databases
ISVC'06 Proceedings of the Second international conference on Advances in Visual Computing - Volume Part I
Probabilistic Arithmetic Automata and Their Applications
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
We present improved techniques for finding homologous regions in DNA and protein sequences. Our approach focuses on the core regions of a local pairwise alignment; we suggest new ways to characterize these regions that allow marked improvements in both specificity and sensitivity over existing techniques for sequence alignment. For any such characterization, which we call a vector seed, we give an efficient algorithm that estimates the specificity and sensitivity of that seed under reasonable probabilistic models of sequence. We also characterize the probability of a match when an alignment is required to have multiple hits before it is detected. Our extensions fit well with existing approaches to sequence alignment, while still offering substantial improvement in runtime and sensitivity, particularly for the important problem of identifying matches between homologous coding DNA sequences.