Protein motifs retrieval by SS terns occurrences

  • Authors:
  • V. Cantoni;A. Ferone;O. Ozbudak;A. Petrosino

  • Affiliations:
  • University of Pavia, Department of Electrical and Computer Engineering, Via A. Ferrata, 1, 27100 Pavia, Italy;University of Naples Parthenope, Department of Applied Science, Centro Direzionale Isola C4, 80133 Napoli, Italy;Istanbul Technical University, Department of Electronics and Communication Engineering, 34469 Istanbul, Turkey;University of Naples Parthenope, Department of Applied Science, Centro Direzionale Isola C4, 80133 Napoli, Italy

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2013

Quantified Score

Hi-index 0.10

Visualization

Abstract

This paper describes a new approach to the analysis of protein 3D structure based on the Secondary Structure (SS) representation. The focus is here on structural motif retrieval. The strategy is derived from the Generalized Hough Transform (GHT), but considering as structural primitive element, the triplet of SSs. The triplet identity is evaluated on the triangle having the vertices on the SS midpoints, and is represented by the three midpoints distances. The motif is characterized by the complete set of triplets, so the Reference Table (RT) has a tuple for each triplet. Tuples contain, beside the discriminant component (the three edge lengths), the mapping rule, i.e. the Reference Point (RP) location referred to the triplet. In the macromolecule to be analyzed, each possible triplet is searched in the RT and every match gives a contribution to a candidate location of the RP. Presence and location of the searched motif are certified by the collection of a number of contribution equal (obviously in absence of noise and ambiguities) to the RT cardinality (i.e. the number of motif triplets). The approach is tested on twenty proteins selected randomly from the PDB, but having a different number of SSs ranging from 14 to 46. The retrieval of all possible structural blocks composed by three, four and five SSs (very compact and completely distributed) have been conducted. The results show valuable performances for precision and computation time.