A Class of Evolution-Based Kernels for Protein Homology Analysis: A Generalization of the PAM Model

Authors:
Valentina Sulimova;Vadim Mottl;Boris Mirkin;Ilya Muchnik;Casimir Kulikowski
Affiliations:
Tula State University, Tula, Russia 300600;Computing Center of the Russian Academy of Sciences, Moscow, Russia 119333;Birkbeck College, University of London, London, UK WC1E 7HX;Rutgers University, New Brunswick, USA 08903;Rutgers University, New Brunswick, USA 08903
Venue:
ISBRA '09 Proceedings of the 5th International Symposium on Bioinformatics Research and Applications
Year:
2009

Citing 4
Cited 1

Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science)

Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science)
Mismatch string kernels for discriminative protein classification

Bioinformatics
Profile-based direct kernels for remote homology detection and fold recognition

Bioinformatics
A structural alignment kernel for protein structures

Bioinformatics

A maximum-likelihood formulation and EM algorithm for the protein multiple alignment problem

PRIB'10 Proceedings of the 5th IAPR international conference on Pattern recognition in bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

There are two desirable properties that a pair-wise similarity measure between amino acid sequences should possess in order to produce good performance in protein homology analysis. First, it is the presence of kernel properties that allow using popular and well-performing computational tools designed for linear spaces, like SVM and k-means. Second, it is very important to take into account common evolutionary descent of homologous proteins. However, none of the existing similarity measures possesses both of these properties at once. In this paper, we propose a simple probabilistic evolution model of amino acid sequences that is built as a straightforward generalization of the PAM evolution model of single amino acids. This model produces a class of kernel functions each of which is computed as the likelihood of the hypothesis that both sequences are results of two independent evolutionary transformations of a hidden common ancestor under some specific assumptions on the evolution mechanism. The proposed class of kernels is rather wide and contains as particular subclasses not only the family of J.-P Vert's local alignment kernels, whose algebraic structure was introduced without any evolutionary motivation, but also some other families of local and global kernels. We demonstrate, via k-means clustering of a set of amino acid sequences from the VIDA database, that the global kernel can be useful in bringing together otherwise very different protein families.