On Utilizing Optimal and Information Theoretic Syntactic Modeling for Peptide Classification

Authors:
Eser Aygün;B. John Oommen;Zehra Cataltepe
Affiliations:
Department of Computer Eng., Istanbul Technical University, Istanbul, Turkey;School of Computer Science, Carleton University, Ottawa, Canada K1S 5B6 and Adjunct Professor at the University of Agder in Grimstad, Norway;Department of Computer Eng., Istanbul Technical University, Istanbul, Turkey
Venue:
PRIB '09 Proceedings of the 4th IAPR International Conference on Pattern Recognition in Bioinformatics
Year:
2009

Citing 4
Cited 0

Artificial neural network model for predicting HIV protease cleavage sites in protein

Advances in Engineering Software
A Sequence Similarity Search Algorithm Based on a Probabilistic Interpretation of an Alignment Scoring System

Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology
Short communication: Specificity rule discovery in HIV-1 protease cleavage site analysis

Computational Biology and Chemistry
Substitution matrix optimisation for peptide classification

EvoBIO'07 Proceedings of the 5th European conference on Evolutionary computation, machine learning and data mining in bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Syntactic methods in pattern recognition have been used extensively in bioinformatics, and in particular, in the analysis of gene and protein expressions, and in the recognition and classification of bio-sequences. These methods are almost universally distance-based. This paper concerns the use of an Optimal and Information Theoretic (OIT) probabilistic model [11] to achieve peptide classification using the information residing in their syntactic representations. The latter has traditionally been achieved using the edit distances required in the respective peptide comparisons. We advocate that one can model the differences between compared strings as a mutation model consisting of random Substitutions, Insertions and Deletions (SID) obeying the OIT model. Thus, in this paper, we show that the probability measure obtained from the OIT model can be perceived as a sequence similarity metric, using which a Support Vector Machine (SVM)-based peptide classifier, referred to as OIT_SVM, can be devised. The classifier, which we have built has been tested for eight different "substitution" matrices and for two different data sets, namely, the HIV-1 Protease Cleavage sites and the T-cell Epitopes . The results show that the OIT model performs significantly better than the one which uses a Needleman-Wunsch sequence alignment score, and the peptide classification methods that previously experimented with the same two datasets.