A threshold of ln n for approximating set cover
Journal of the ACM (JACM)
Designing seeds for similarity search in genomic DNA
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Designing multiple simultaneous seeds for DNA similarity search
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Sensitivity analysis and efficient method for identifying optimal spaced seeds
Journal of Computer and System Sciences
On spaced seeds for similarity search
Discrete Applied Mathematics
Estimating Seed Sensitivity on Homogeneous Alignments
BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Vector seeds: An extension to spaced seeds
Journal of Computer and System Sciences - Special issue on bioinformatics II
Good spaced seeds for homology search
Bioinformatics
Computing Alignment Seed Sensitivity with Probabilistic Arithmetic Automata
WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
On Subset Seeds for Protein Alignment
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
New algorithms for the spaced seeds
FAW'07 Proceedings of the 1st annual international conference on Frontiers in algorithmics
Protein similarity search with subset seeds on a dedicated reconfigurable hardware
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Quality of algorithms for sequence comparison
PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
A unifying framework for seed sensitivity and its application to subset seeds
WABI'05 Proceedings of the 5th International conference on Algorithms in Bioinformatics
Hi-index | 0.00 |
We present a framework for improving local protein alignment algorithms. Specifically, we discuss how to extend local protein aligners to use a collection of vector seeds or ungapped alignment seeds to reduce noise hits. We model picking a set of seed models as an integer programming problem and give algorithms to choose such a set of seeds. While the problem is NP-hard, and Quasi-NP-hard to approximate to within a logarithmic factor, it can be solved easily in practice. A good set of seeds we have chosen allows four to five times fewer false positive hits, while preserving essentially identical sensitivity as BLASTP.