On Optimizing Distance-Based Similarity Search for Biological Databases
CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
Improved alignment of protein sequences based on common parts
ISBRA'08 Proceedings of the 4th international conference on Bioinformatics research and applications
Dimension reduction for distance-based indexing
Proceedings of the Third International Conference on SImilarity Search and APplications
Metric-space search in bioinformatics
SIGSPATIAL Special
Detecting fuzzy amino acid tandem repeats in protein sequences
Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Pivot selection: Dimension reduction for distance-based indexing
Journal of Discrete Algorithms
Hi-index | 3.84 |
Motivation: We address the question of whether there exists an effective evolutionary model of amino-acid substitution that forms a metric-distance function. There is always a trade-off between speed and sensitivity among competing computational methods of determining sequence homology. A metric model of evolution is a prerequisite for the development of an entire class of fast sequence analysis algorithms that are both scalable, O(log n) and sensitive. Results: We have reworked the mathematics of the point accepted mutation model (PAM) by calculating the expected time between accepted mutations in lieu of calculating log-odds probabilities. The resulting substitution matrix (mPAM) forms a metric. We validate the application of the mPAM evolutionary model for sequence homology by executing sequence queries from a controlled yeast protein homology search benchmark. We compare the accuracy of the results of mPAM and PAM similarity matrices as well as three prior metric models. The experiment shows that mPAM significantly outperforms the other three metrics and sufficiently approaches the sensitivity of PAM250 to make it applicable to the management of protein sequence databases.