Discriminative frequent subgraph mining with optimality guarantees

Authors:
Marisa Thoma;Hong Cheng;Arthur Gretton;Jiawei Han;Hans-Peter Kriegel;Alex Smola;Le Song;Philip S. Yu;Xifeng Yan;Karsten M. Borgwardt
Affiliations:
Institute for Informatics, Ludwig-Maximilians-Universität München, Munich, Germany;Department of Systems Engineering and Engineering Management, Chinese University of Hong Kong, Hong Kong, China;School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA;Institute for Informatics, Ludwig-Maximilians-Universität München, Munich, Germany;Yahoo! Research, Santa Clara, CA, USA;School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA;University of Illinois at Chicago, Chicago, IL, USA;Department of Computer Science, University of California, Santa Barbara, CA, USA;Max Planck Institute for Developmental Biology and Max Planck Institute for Biological Cybernetics, Tübingen, Germany
Venue:
Statistical Analysis and Data Mining
Year:
2010

Citing 0
Cited 2

Effective graph classification based on topological and label attributes

Statistical Analysis and Data Mining
Subtree selection in kernels for graph classification

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of frequent subgraph mining is to detect subgraphs that frequently occur in a dataset of graphs. In classification settings, one is often interested in discovering discriminative frequent subgraphs, whose presence or absence is indicative of the class membership of a graph. In this article, we propose an approach to feature selection on frequent subgraphs, called CORK, that combines two central advantages. First, it optimizes a submodular quality criterion, which means that we can yield a near-optimal solution using greedy feature selection. Second, our submodular quality function criterion can be integrated into gSpan, the state-of-the-art tool for frequent subgraph mining, and help to prune the search space for discriminative frequent subgraphs even during frequent subgraph mining. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 302-318, 2010