Designing seeds for similarity search in genomic DNA
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Constrained Independence System and Triangulations of Planar Point Sets
COCOON '95 Proceedings of the First Annual International Conference on Computing and Combinatorics
Designing multiple simultaneous seeds for DNA similarity search
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
On spaced seeds for similarity search
Discrete Applied Mathematics
Efficient Methods for Generating Optimal Single and Multiple Spaced Seeds
BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Optimizing Multiple Seeds for Protein Homology Search
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Vector seeds: An extension to spaced seeds
Journal of Computer and System Sciences - Special issue on bioinformatics II
Indel seeds for homology search
Bioinformatics
Rapid Homology Search with Neighbor Seeds
Algorithmica
Hi-index | 0.00 |
We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform a comparative analysis of seeds built over those alphabets and compare them with the standard Blastp seeding method [2], [3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seeds is less expressive (but less costly to implement) than the cumulative principle used in Blastp and vector seeds, our seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix. Finally, we perform a large-scale benchmarking of our seeds against several main databases of protein alignments. Here again, the results show a comparable or better performance of our seeds versus Blastp.