Coding of amino acids by texture descriptors

  • Authors:
  • Loris Nanni;Alessandra Lumini

  • Affiliations:
  • Department of Electronic, Informatics and Systems (DEIS), Universití di Bologna, Via Venezia 52, 47023 Cesena, Italy;Department of Electronic, Informatics and Systems (DEIS), Universití di Bologna, Via Venezia 52, 47023 Cesena, Italy

  • Venue:
  • Artificial Intelligence in Medicine
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Objective: In this paper we propose a new feature extractor for peptide/protein classification based on the calculation of texture descriptors. Representing a peptide/protein using a matrix descriptor, instead of a vector, allows to deal with the peptide/protein as an image and to use texture descriptors for representation purposes. Methods and materials: A matrix descriptor, which is a squared matrix of the dimension of the peptide/protein, is obtained considering a partial ordering of the amino acids of the peptide/protein according to their value of a given physicochemical property. Each matrix descriptor is considered as a texture image and several texture descriptors are considered to obtain a compact representation which is scale invariant (i.e. independent on the length of the peptide\protein). The texture descriptors tested in this work are: local binary patterns (LBP), discrete cosine transform (DCT) and Daubechies wavelets. Results and conclusion: The experimental section reports several tests, aimed at supporting our ideas, performed on the following datasets: vaccine dataset for the predictions of peptides that bind human leukocyte antigens; human immunodeficiency virus (HIV-1) protease cleavage site prediction dataset and membrane proteins type dataset. The experimental results confirm the usefulness of the novel descriptors: the performance obtained by our system on the three difficult datasets is quite high, indicating that the proposed method is a feasible system for extracting information from peptides and proteins. The performance obtained by each of the three texture descriptors calculated from the matrix-based representation, and coupled to a support vector machine classifier, is lower than the performance obtained by other vector-based descriptors based on physicochemical properties proposed in the literature. Anyway the new descriptors bring different information and our tests show that the texture descriptors and the vector-based descriptors can be combined to improve the overall performance of the system. In particular the proposed approach improves the state-of-the-art results in two out of three tested problems (HIV-1 protease cleavage site prediction dataset and membrane proteins type dataset).