Pattern discovery from graph-structured data: a data mining perspective

Authors:
Hiroshi Motoda
Affiliations:
Asian Office of Aerospace Research & Development, Air Force Office of Scientific Research, Tokyo, Japan
Venue:
IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
Year:
2007

Citing 8
Cited 0

CLIP: concept learning from inference patterns

Artificial Intelligence - Special issue: AI research in Japan
Complete Mining of Frequent Patterns from Graphs: Mining Graph Data

Machine Learning
Graph-Based Data Mining

IEEE Intelligent Systems
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An Efficient Algorithm for Discovering Frequent Subgraphs

IEEE Transactions on Knowledge and Data Engineering
A General Framework for Mining Frequent Subgraphs from Labeled Graphs

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Constructing a Decision Tree for Graph-Structured Data and its Applications

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Cl-GBI: a novel approach for extracting typical patterns from graph-structured data

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining from graph-structured data has its root in concept formation. Recent advancement of data mining techniques has broadened its applicability. Graph mining faces with subgraph isomorphism which is known to be NP-complete. Two contrasting approaches of our work on extracting frequent subgraphs are revisited, one using complete search (AGM) and the other using heuristic search (GBI). Both use canonical labelling to deal with subgraph isomorphism. AGM represents a graph by its adjacency matrix and employs an Apriori-like bottom up search algorithm using anti-monotonicity of frequency. It can handle both connected and dis-connected graphs, and has been extended to handle a tree data and a sequential data by incorporating a different bias to each in joining operators. It has also been extended to incorporate taxonomy in labels to extract generalized subgraphs. GBI employs a notion of chunking, which recursively chunks two adjoining nodes, thus generating fairly large subgraphs at an early stage of search. The recent improved version extends it to employ pseudo-chunking which is called chunkingless chunking, enabling to extract overlapping subgraphs. It can impose two kinds of constraints to accelerate search, one to include one or more of the designated subgraphs and the other to exclude all of the designated subgraphs. It has been extended to extract paths and trees from a graph data by placing a restriction on pseudo-chunking operations. GBI can further be used as a feature constructor in decision tree building. The paper explains how both GBI and AGM with their extended versions can be applied to solve various data mining problems which are difficult to solve by other methods.