Peptide classification using optimal and information theoretic syntactic modeling

Authors:
E. Aygün;B. J. Oommen;Z. Cataltepe
Affiliations:
Computer Engineering Department, Istanbul Technical University, Maslak, Istanbul 34469, Turkey;School of Computer Science, Carleton University, Ottawa, Ontario, Canada K1S 5B6;Computer Engineering Department, Istanbul Technical University, Maslak, Istanbul 34469, Turkey
Venue:
Pattern Recognition
Year:
2010

Citing 11
Cited 0

Artificial neural network model for predicting HIV protease cleavage sites in protein

Advances in Engineering Software
A Sequence Similarity Search Algorithm Based on a Probabilistic Interpretation of an Alignment Scoring System

Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology
Protein homology detection by HMM--HMM comparison

Bioinformatics
RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins

Bioinformatics
Multi-class protein fold recognition using adaptive codes

ICML '05 Proceedings of the 22nd international conference on Machine learning
Profile-based direct kernels for remote homology detection and fold recognition

Bioinformatics
Short communication: Specificity rule discovery in HIV-1 protease cleavage site analysis

Computational Biology and Chemistry
A novel knowledge-based approach to design inorganic-binding peptides

Bioinformatics
Substitution matrix optimisation for peptide classification

EvoBIO'07 Proceedings of the 5th European conference on Evolutionary computation, machine learning and data mining in bioinformatics
Brief communication: Reduced bio basis function neural network for identification of protein phosphorylation sites: comparison with pattern recognition algorithms

Computational Biology and Chemistry
Orthogonal least squares learning algorithm for radial basis function networks

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.01

Visualization

Abstract

We consider the problem of classifying peptides using the information residing in their syntactic representations. This problem, which has been studied for more than a decade, has typically been investigated using distance-based metrics that involve the edit operations required in the peptide comparisons. In this paper, we shall demonstrate that the Optimal and Information Theoretic (OIT) model of Oommen and Kashyap [22] applicable for syntactic pattern recognition can be used to tackle peptide classification problem. We advocate that one can model the differences between compared strings as a mutation model consisting of random substitutions, insertions and deletions obeying the OIT model. Thus, in this paper, we show that the probability measure obtained from the OIT model can be perceived as a sequence similarity metric, using which a support vector machine (SVM)-based peptide classifier can be devised. The classifier, which we have built has been tested for eight different substitution matrices and for two different data sets, namely, the HIV-1 Protease cleavage sites and the T-cell epitopes. The results show that the OIT model performs significantly better than the one which uses a Needleman-Wunsch sequence alignment score, it is less sensitive to the substitution matrix than the other methods compared, and that when combined with a SVM, is among the best peptide classification methods available.