Efficient discovery of structural motifs from protein sequences with combination of flexible intra- and inter-block gap constraints

  • Authors:
  • Chen-Ming Hsu;Chien-Yu Chen;Ching-Chi Hsu;Baw-Jhiune Liu

  • Affiliations:
  • Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan, R.O.C.;Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei, Taiwan, R.O.C.;Institute for Information Industry, Taipei, Taiwan, R.O.C.;Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan, R.O.C.

  • Venue:
  • PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Discovering protein structural signatures directly from their primary information is a challenging task, because the residues associated with a functional motif are not necessarily clustered in one region of the sequence. This work proposes an algorithm that aims to discover conserved sequential blocks interleaved by large irregular gaps from a set of unaligned biological sequences. Different from the previous works that employ only one type of constraint on gap flexibility, we propose using combination of intra- and inter-block gap constraints to discover longer patterns with larger irregular gaps. The smaller flexible intra-block gap constraint is used to relax the restriction in local motif blocks but still keep them compact, and the larger flexible inter-block gap constraint is proposed to allow longer irregular gaps between compact motif blocks. Using two types of gap constraints for different purposes improves the efficiency of mining process while keeping high accuracy of mining results. The efficiency of the algorithm also helps to identify functional motifs that are conserved in only a small subset of the input sequences.