Subdue: compression-based frequent pattern discovery in graph data

Authors:
Nikhil S. Ketkar;Lawrence B. Holder;Diane J. Cook
Affiliations:
University of Texas at Arlington;University of Texas at Arlington;University of Texas at Arlington
Venue:
Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
Year:
2005

Citing 14
Cited 9

Theories for mutagenicity: a study in first-order and feature-based induction

Artificial Intelligence - Special volume on empirical methods
Exploiting parallelism in a structural scientific discovery system to improve scalability

Journal of the American Society for Information Science - Special topic issue: youth issues in information science
Stochastic Complexity in Statistical Inquiry Theory

Stochastic Complexity in Statistical Inquiry Theory
Discovery of frequent DATALOG patterns

Data Mining and Knowledge Discovery
Diffusion Kernels on Graphs and Other Discrete Input Spaces

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Applying the Subdue Substructure Discovery System to the Chemical Toxicity Domain

Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference
Discovering Frequent Geometric Subgraphs

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A quickstart in frequent structure mining can make a difference

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
An Efficient Algorithm for Discovering Frequent Subgraphs

IEEE Transactions on Knowledge and Data Engineering
Finding Frequent Patterns in a Large Sparse Graph*

Data Mining and Knowledge Discovery
The levelwise version space algorithm and its application to molecular fragment finding

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Using Data Mining to Build Integrated Discrete Event Simulations

ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
Frequent pattern-growth approach for document organization

Proceedings of the 2nd international workshop on Ontologies and information systems for the semantic web
Comparing graph-based representations of protein for mining purposes

Proceedings of the KDD-09 Workshop on Statistical and Relational Learning in Bioinformatics
A chorem-based approach for visually synthesizing complex phenomena

Information Visualization
WS-GraphMatching: a web service tool for graph matching

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
DESSIN: mining dense subgraph patterns in a single graph

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Semantically-guided clustering of text documents via frequent subgraphs discovery

ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Using substructure mining to identify misbehavior in network provenance graphs

First International Workshop on Graph Data Management Experiences and Systems
Comparative analysis of the use of chemoinformatics-based and substructure-based descriptors for quantitative structure-activity relationship QSAR modeling

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

A majority of the existing algorithms which mine graph datasets target complete, frequent sub-graph discovery. We describe the graph-based data mining system Subdue which focuses on the discovery of sub-graphs which are not only frequent but also compress the graph dataset, using a heuristic algorithm. The rationale behind the use of a compression-based methodology for frequent pattern discovery is to produce a fewer number of highly interesting patterns than to generate a large number of patterns from which interesting patterns need to be identified. We perform an experimental comparison of Subdue with the graph mining systems gSpan and FSG on the Chemical Toxicity and the Chemical Compounds datasets that are provided with gSpan. We present results on the performance on the Subdue system on the Mutagenesis and the KDD 2003 Citation Graph dataset. An analysis of the results indicates that Subdue can efficiently discover best-compressing frequent patterns which are fewer in number but can be of higher interest.