Latent structure pattern mining

Authors:
Andreas Maunz;Christoph Helma;Tobias Cramer;Stefan Kramer
Affiliations:
Freiburg Center for Data Analysis and Modeling, Freiburg im Breisgau, Germany;in-silico Toxicology, Basel, Switzerland;Freiburg Center for Data Analysis and Modeling, Freiburg im Breisgau, Germany;Institut für Informatik, Technische Universität München, Garching bei München, Germany
Venue:
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Year:
2010

Citing 9
Cited 0

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
A quickstart in frequent structure mining can make a difference

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Generalized Substructures from a Set of Labeled Graphs

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Large scale mining of molecular fragments with wildcards

Intelligent Data Analysis
Mining significant graph patterns by leap search

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Optimizing Feature Sets for Structured Data

ECML '07 Proceedings of the 18th European conference on Machine Learning
Large-scale graph mining using backbone refinement classes

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Augmenting the generalized hough transform to enable the mining of petroglyphs

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pattern mining methods for graph data have largely been restricted to ground features, such as frequent or correlated subgraphs. Kazius et al. have demonstrated the use of elaborate patterns in the biochemical domain, summarizing several ground features at once. Such patterns bear the potential to reveal latent information not present in any individual ground feature. However, those patterns were handcrafted by chemical experts. In this paper, we present a data-driven bottom-up method for pattern generation that takes advantage of the embedding relationships among individual ground features. The method works fully automatically and does not require data preprocessing (e.g., to introduce abstract node or edge labels). Controlling the process of generating ground features, it is possible to align them canonically and merge (stack) them, yielding a weighted edge graph. In a subsequent step, the subgraph features can further be reduced by singular value decomposition (SVD). Our experiments show that the resulting features enable substantial performance improvements on chemical datasets that have been problematic so far for graph mining approaches.