HDB-Subdue: A Scalable Approach to Graph Mining

Authors:
Srihari Padmanabhan;Sharma Chakravarthy
Affiliations:
IT Laboratory & Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington 76019;IT Laboratory & Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington 76019
Venue:
DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Year:
2009

Citing 8
Cited 2

Complete Mining of Frequent Patterns from Graphs: Mining Graph Data

Machine Learning
Graph-Based Data Mining

IEEE Intelligent Systems
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
State of the art of graph-based data mining

ACM SIGKDD Explorations Newsletter
Performance evaluation and analysis of K-way join variants for association rule mining

BNCOD'03 Proceedings of the 20th British national conference on Databases
Enhanced DB-Subdue: supporting subtle aspects of graph mining using a relational approach

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

An iterative MapReduce approach to frequent subgraph mining in biological datasets

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Using substructure mining to identify misbehavior in network provenance graphs

First International Workshop on Graph Data Management Experiences and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Transactional data mining (association rules, decision trees etc.) has been effectively used to find non-trivial patterns in categorical and unstructured data. For applications that have an inherent structure (e.g., social networks, proteins), graph mining is useful since mapping the structured data into a transactional representation will lead to loss of information. Graph mining is used for identifying interesting or frequent subgraphs. Database mining uses SQL and relational representation to overcome limitations of main memory algorithms and to achieve scalability. This paper presents a scalable, SQL-based approach to graph mining --- specifically, interesting substructure discovery. The most general form of graphs including directed edges, multiple edges between nodes, and cycles are handled by our approach. Our primary goal in this work has been to address scalability, and map difficult and computationally expensive problems such as pseudo duplicate elimination, canonical labeling, and isomorphism checking into SQL-based counterparts. The notion of minimum description length (MDL) has been cast into corresponding metric for relational representation. Our experimental analysis shows that graphs with Millions of nodes and edges can be handled by the algorithm and the approach presented in this paper.