Complete Mining of Frequent Patterns from Graphs: Mining Graph Data
Machine Learning
IEEE Intelligent Systems
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth
Proceedings of the 17th International Conference on Data Engineering
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
State of the art of graph-based data mining
ACM SIGKDD Explorations Newsletter
Performance evaluation and analysis of K-way join variants for association rule mining
BNCOD'03 Proceedings of the 20th British national conference on Databases
Enhanced DB-Subdue: supporting subtle aspects of graph mining using a relational approach
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
An iterative MapReduce approach to frequent subgraph mining in biological datasets
Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Using substructure mining to identify misbehavior in network provenance graphs
First International Workshop on Graph Data Management Experiences and Systems
Hi-index | 0.00 |
Transactional data mining (association rules, decision trees etc.) has been effectively used to find non-trivial patterns in categorical and unstructured data. For applications that have an inherent structure (e.g., social networks, proteins), graph mining is useful since mapping the structured data into a transactional representation will lead to loss of information. Graph mining is used for identifying interesting or frequent subgraphs. Database mining uses SQL and relational representation to overcome limitations of main memory algorithms and to achieve scalability. This paper presents a scalable, SQL-based approach to graph mining --- specifically, interesting substructure discovery. The most general form of graphs including directed edges, multiple edges between nodes, and cycles are handled by our approach. Our primary goal in this work has been to address scalability, and map difficult and computationally expensive problems such as pseudo duplicate elimination, canonical labeling, and isomorphism checking into SQL-based counterparts. The notion of minimum description length (MDL) has been cast into corresponding metric for relational representation. Our experimental analysis shows that graphs with Millions of nodes and edges can be handled by the algorithm and the approach presented in this paper.