Mining Protein Sequence Motifs Representing Common 3D Structures

Authors:
Wei Zhong;Gulsah Altum;Robert Harrison;Phang C. Tai;Yi Pan
Affiliations:
Georgia State University, Atlanta;Georgia State University, Atlanta;Georgia State University, Atlanta;Georgia State University, Atlanta;Georgia State University, Atlanta
Venue:
CSBW '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference - Workshops
Year:
2005

Citing 0
Cited 1

Clustering support vector machines and its application to local protein tertiary structure prediction

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Understanding the relationship between protein structure and its sequence is one of the most important tasks of current bioinformatics research. In this work, recurring protein sequence motifs are explored with a K-means clustering algorithm. No structural information is used during the clustering process so that the relationship between sequence similarity and structural similarity for sequence-based clusters can be studied. This work focuses on characterizing structural similarity so that the quality of sequence clusters can be assessed accurately. Analysis of results reveals that the combined metric of distance matrix root mean squared deviation for sequence cluster (dmRMSD_SC) and torsion angle RMSD_SC (taRMSD_SC) can provide the reliable indication of structural similarity for sequence clusters. Based on our combined metric, the recurrent sequence clusters with high structural similarity are used to generate sequence motifs. The common 3D structure of a sequence motif is represented by both representative backbone torsion angles and average distance matrices of the sequence cluster used to produce this motif. These motifs provide the foundation to develop a protein vocabulary reflecting sequence-structure correspondence.