Efficient Discovery of Common Substructures in Macromolecules

  • Authors:
  • Srinivasan Parthasarathy;Matt Coatney

  • Affiliations:
  • -;-

  • Venue:
  • ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Biological macromolecules play a fundamental role indisease; therefore, they are of great interest to fields such aspharmacology and chemical genomics. Yet due to macromolecules'complexity, development of effective techniquesfor elucidating structure-function macromolecular relationshipshas been ill explored. Previous techniques have eitherfocused on sequence analysis, which only approximatesstructure-function relationships, or on small coor-dinatedatasets, which does not scale to large datasets orhandle noise. We present a novel scalable approach toefficiently discover macromolecule substructures based onthree-dimensional coordinate data, without domain-specificknowledge. The approach combines structure-based frequentpattern discovery with search space reduction andcoordinate noise handling. We analyze computational performancecompared to traditional approaches, validate thatour approach can discover meaningful substructures innoisy macromolecule data by automated discovery of primaryand secondary protein structures, and show that ourtechnique is superior to sequence-based approaches at determiningstructural, and thus functional, similarity be-tweenproteins.