Predicting protein structural class from closed protein sequences

  • Authors:
  • N. Rattanakronkul;T. Wattarujeekrit;K. Waiyamai

  • Affiliations:
  • Knowledge Discovery from very Large database research group, KDL, Computer Engineering Department, Kasetsart University, Thailand;Knowledge Discovery from very Large database research group, KDL, Computer Engineering Department, Kasetsart University, Thailand;Knowledge Discovery from very Large database research group, KDL, Computer Engineering Department, Kasetsart University, Thailand

  • Venue:
  • PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

ProsMine is a system for automatically predicting protein structural class from sequence, based on using a combination of data mining techniques. Contrary to our previous protein structural class prediction system, where only enzyme proteins can be predicted, ProsMine can predict the structural class for all of the proteins. We investigate the most effective way to represent protein sequences in our new prediction system. Based on the lattice theory, our idea is to discover the set of Closed Sequences from the protein sequence database and use those appropriate Closed Sequences as protein features. A sequence is said to be "closed" for a given protein sequence database if it is the maximal subsequence of all the protein sequences in that database. Efficient algorithms have been proposed for discovering closed sequences and selecting appropriate closed sequence for each protein structural class. Experimental results, using data extracted from SWISS-PROT and CATH databases, showed that ProsMine yielded better accuracy compared to our previous work even for the most specific level (Homologous-Superfamily Level) of the CATH protein structure hierarchy, which consists of 637 possible classes.