MotifMiner: A General Toolkit for Efficiently Identifying Common Substructures in Molecules

Authors:
Matt Coatney;Srinivasan Parthasarathy
Affiliations:
-;-
Venue:
BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
Year:
2003

Citing 0
Cited 3

Parallel algorithms for mining frequent structural motifs in scientific data

Proceedings of the 18th annual international conference on Supercomputing
A Services Oriented Framework for Next Generation Data Analysis Centers

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Finding Patterns on Protein Surfaces: Algorithms and Applications to Protein Classification

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scientific research often involves examining structural relationships in molecules since scientists strongly believe in the causal relationship between structure and function. Traditionally, researchers have identified these patterns, or motifs, manually using biochemical expertise. However, with the massive influx of new biochemical data and the ability to gather data for very large molecules, there is great need for techniques that automatically and efficiently identify commonly occurring structural patterns in molecules. Previous automated substructure discovery approaches have each introduced variations of similar underlying techniques and have embedded domain knowledge. While doing so improves performance for the particular domain, this complicates extensibility to other domains. Also, they do not address scalability or noise, which is critical for certain structural domains like macromolecules. In this paper, we present MotifMiner, a general toolkit for automatically identifying common motifs in most any scientific molecular dataset. We describe both our application framework and services for identifying motifs, as well as demonstrate the flexibility of our system by analyzing several disparate domains, including protein, drug, and MD simulation datasets.