Efficient Discovery of Common Substructures in Macromolecules

Authors:
Srinivasan Parthasarathy;Matt Coatney
Affiliations:
-;-
Venue:
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Year:
2002

Citing 0
Cited 5

Parallel algorithms for mining frequent structural motifs in scientific data

Proceedings of the 18th annual international conference on Supercomputing
Finding Patterns on Protein Surfaces: Algorithms and Applications to Protein Classification

IEEE Transactions on Knowledge and Data Engineering
Discovering frequent topological structures from graph datasets

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Efficient pattern mining on shared memory systems: implications for chip multiprocessor architectures

Proceedings of the 2006 workshop on Memory system performance and correctness
Discovering frequent geometric subgraphs

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Biological macromolecules play a fundamental role indisease; therefore, they are of great interest to fields such aspharmacology and chemical genomics. Yet due to macromolecules'complexity, development of effective techniquesfor elucidating structure-function macromolecular relationshipshas been ill explored. Previous techniques have eitherfocused on sequence analysis, which only approximatesstructure-function relationships, or on small coor-dinatedatasets, which does not scale to large datasets orhandle noise. We present a novel scalable approach toefficiently discover macromolecule substructures based onthree-dimensional coordinate data, without domain-specificknowledge. The approach combines structure-based frequentpattern discovery with search space reduction andcoordinate noise handling. We analyze computational performancecompared to traditional approaches, validate thatour approach can discover meaningful substructures innoisy macromolecule data by automated discovery of primaryand secondary protein structures, and show that ourtechnique is superior to sequence-based approaches at determiningstructural, and thus functional, similarity be-tweenproteins.