Classification tree based protein structure distances for testing sequence-structure correlation

  • Authors:
  • Elias Zintzaras

  • Affiliations:
  • Department of Biomathematics, University of Thessaly School of Medicine, Larissa, Greece and Institute for Clinical Research and Health Policy Studies, Tufts-New England Medical Center, Tufts Univ ...

  • Venue:
  • Computers in Biology and Medicine
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

A methodology for testing the correlation between the sequence and structure distances of proteins is proposed. Structure distances were derived by applying a forward growing classification tree algorithm on defined physico-chemical and geometrical properties of the structures. The structure distance for every pair of proteins was defined as the number of intermediate nodes in the tree. Sequence distances were derived using pairwise sequence alignment. Then, correlation between sequence distance matrix and sequence distance matrix was tested using a Monte Carlo permutation test. The results were compared to those when the double dynamic structure alignment method (SSAP) was applied. The methodology was applied to a data set of 74 proteins belonging to 14 families. The classification tree was able to identify the protein families (the misclassification rate was R=1.4%) and a 74x74 structure distance matrix was produced. For every pair of protein sequences a dissimilarity score was recorded and a sequence distance matrix was produced. The Monte Carlo permutation produced a correlation coefficient r=0.403 (P