Motif Extraction and Protein Classification

  • Authors:
  • Vered Kunik;Zach Solan;Shimon Edelman;Eytan Ruppin;David Horn

  • Affiliations:
  • Tel Aviv University;Tel Aviv University;Cornell University;Tel Aviv University;Tel Aviv University

  • Venue:
  • CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a novel unsupervised method for extracting meaningful motifs from biological sequence data. This de novo motif extraction (MEX) algorithm is data driven, finding motifs that are not necessarily over-represented in the data. Applying MEX to the oxidoreductases class of enzymes, containing approximately 7000 enzyme sequences, a relatively small set of motifs is obtained. This set spans a motif-space that is used for functional classification of the enzymes by an SVM classifier. The classification based on MEX motifs surpasses that of two other SVM based methods: SVMProt, a method based on the analysis of physical-chemical properties of a protein generated from its sequence of amino acids, and SVM applied to aSmith-Waterman distances matrix. Our findings demonstrate that the MEX algorithm extracts relevant motifs, supporting a successful sequence-to-function classification.