Synthesizing Aligned Random Pattern Digraphs from protein sequence patterns

  • Authors:
  • Annie En-Shiun Lee;Andrew K. C. Wong

  • Affiliations:
  • Systems Design Engineering, University of Waterloo, Waterloo, Canada;Systems Design Engineering, University of Waterloo, Waterloo, Canada

  • Venue:
  • BIBMW '11 Proceedings of the 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

An essential step of protein function analysis is to discover patterns that represent functional regions in a set of protein family sequences. However, the same functional region of a protein family that occurs in different sequences may contain variations that resulted from biological substitutions, deletions, and insertions. Thus, a sequence pattern representing this functional region seldom repeats precisely at the exact position with the same amino acid residues. To capture these variable associations, we developed a pattern synthesis process. First, we used an effective sequence pattern discovery algorithm to discover high order patterns as input. Next, we group and align these similar discovered patterns into Aligned Random Pattern Clusters (ARPCs). During the clustering process, each ARPC is transformed into a probabilistic structural pattern called the Aligned Random Pattern Digraph (ARPD). The advantages of our synthesis process are 1) the synthesized patterns are not confined to a fixed protein region since the ARPCs captures the similar patterns by their variable sites, 2) the ARPDs retain both horizontal pattern associations and vertical site variations, and 3) the search space for synthesizing input patterns is smaller than that for aligning input sequences. Our method successfully discovers two functional protein regions of the Cytochrome Complex protein family: the proximal and distal binding segment that binds the iron molecule of the heme ligand from each side of the plane without relying on prior knowledge.