MotifMiner: Efficient discovery of common substructures in biochemical molecules

Authors:
Matt Coatney;Srinivasan Parthasarathy
Affiliations:
The Ohio State University, Computer and Information Science, Columbus, OH, USA;The Ohio State University, Department of Computer and Information Science, 395 Dreese Lab, 2015 Neil Ave., 43210, Columbus, OH, USA
Venue:
Knowledge and Information Systems
Year:
2005

Citing 0
Cited 5

Parallel algorithms for mining frequent structural motifs in scientific data

Proceedings of the 18th annual international conference on Supercomputing
CanTree: a canonical-order tree for incremental frequent-pattern mining

Knowledge and Information Systems
Clustering multidimensional sequences in spatial and temporal databases

Knowledge and Information Systems
Frequent subgraph mining on a single large graph using sampling techniques

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
A multiobjective evolutionary programming framework for graph-based data mining

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Biochemical research often involves examining structural relationships in molecules since scientists strongly believe in the causal relationship between structure and function. Traditionally, researchers have identified these patterns, or motifs, manually using domain expertise. However, with the massive influx of new biochemical data and the ability to gather data for very large molecules, there is great need for techniques that automatically and efficiently identify commonly occurring structural patterns in molecules. Previous automated substructure discovery approaches have each introduced variations of similar underlying techniques and have embedded domain knowledge. While doing so improves performance for the particular domain, this complicates extensibility to other domains. Also, they do not address scalability or noise, which is critical for macromolecules such as proteins. In this paper, we present MotifMiner, a general framework for efficiently identifying common motifs in most scientific molecular datasets. The approach combines structure-based frequent-pattern discovery with search space reduction and coordinate noise handling. We describe both the framework and several algorithms as well as demonstrate the flexibility of our system by analyzing protein and drug biochemical datasets.