A metric model of amino acid substitution

Authors:
Weijia Xu;Daniel P. Miranker
Affiliations:
Department of Computer Sciences, The Center for Computational Biology and Bioinformatics, University of Texas, Austin, TX 78712, USA;Department of Computer Sciences, The Center for Computational Biology and Bioinformatics, University of Texas, Austin, TX 78712, USA
Venue:
Bioinformatics
Year:
2004

Citing 0
Cited 6

On Optimizing Distance-Based Similarity Search for Biological Databases

CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
Improved alignment of protein sequences based on common parts

ISBRA'08 Proceedings of the 4th international conference on Bioinformatics research and applications
Dimension reduction for distance-based indexing

Proceedings of the Third International Conference on SImilarity Search and APplications
Metric-space search in bioinformatics

SIGSPATIAL Special
Detecting fuzzy amino acid tandem repeats in protein sequences

Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Pivot selection: Dimension reduction for distance-based indexing

Journal of Discrete Algorithms

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: We address the question of whether there exists an effective evolutionary model of amino-acid substitution that forms a metric-distance function. There is always a trade-off between speed and sensitivity among competing computational methods of determining sequence homology. A metric model of evolution is a prerequisite for the development of an entire class of fast sequence analysis algorithms that are both scalable, O(log n) and sensitive. Results: We have reworked the mathematics of the point accepted mutation model (PAM) by calculating the expected time between accepted mutations in lieu of calculating log-odds probabilities. The resulting substitution matrix (mPAM) forms a metric. We validate the application of the mPAM evolutionary model for sequence homology by executing sequence queries from a controlled yeast protein homology search benchmark. We compare the accuracy of the results of mPAM and PAM similarity matrices as well as three prior metric models. The experiment shows that mPAM significantly outperforms the other three metrics and sufficiently approaches the sensitivity of PAM250 to make it applicable to the management of protein sequence databases.