Mining Protein Sequence Motifs Representing Common 3D Structures

  • Authors:
  • Wei Zhong;Gulsah Altum;Robert Harrison;Phang C. Tai;Yi Pan

  • Affiliations:
  • Georgia State University, Atlanta;Georgia State University, Atlanta;Georgia State University, Atlanta;Georgia State University, Atlanta;Georgia State University, Atlanta

  • Venue:
  • CSBW '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference - Workshops
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Understanding the relationship between protein structure and its sequence is one of the most important tasks of current bioinformatics research. In this work, recurring protein sequence motifs are explored with a K-means clustering algorithm. No structural information is used during the clustering process so that the relationship between sequence similarity and structural similarity for sequence-based clusters can be studied. This work focuses on characterizing structural similarity so that the quality of sequence clusters can be assessed accurately. Analysis of results reveals that the combined metric of distance matrix root mean squared deviation for sequence cluster (dmRMSD_SC) and torsion angle RMSD_SC (taRMSD_SC) can provide the reliable indication of structural similarity for sequence clusters. Based on our combined metric, the recurrent sequence clusters with high structural similarity are used to generate sequence motifs. The common 3D structure of a sequence motif is represented by both representative backbone torsion angles and average distance matrices of the sequence cluster used to produce this motif. These motifs provide the foundation to develop a protein vocabulary reflecting sequence-structure correspondence.