Machine Discovery of Protein Motifs

  • Authors:
  • Darrell Conklin

  • Affiliations:
  • ZymoGenetics Inc., 1201 Eastlake Avenue East, Seattle, WA 98102. conklin@zgi.com

  • Venue:
  • Machine Learning - Special issue on applications in molecular biology
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

The investigation of relations between protein tertiary structure and amino acid sequence is a topic of tremendous importance in molecular biology. The automated discovery of recurrent patterns of structure and sequence is an essential part of this investigation. These patterns, known as protein motifs, are abstractions of fragments drawn from proteins of known sequence and tertiary structure. This paper has two objectives. The first is to introduce and define protein motifs, and provide a survey of previous research on protein motif discovery. The second is to present and apply a novel approach to protein motif representation and discovery, which is based on a spatial description logic and the symbolic machine learning paradigm of structured concept formation. A large database of protein fragments is processed using this approach, and several interesting and significant protein motifs are discovered.