HDB-Subdue: A Scalable Approach to Graph Mining

  • Authors:
  • Srihari Padmanabhan;Sharma Chakravarthy

  • Affiliations:
  • IT Laboratory & Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington 76019;IT Laboratory & Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington 76019

  • Venue:
  • DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Transactional data mining (association rules, decision trees etc.) has been effectively used to find non-trivial patterns in categorical and unstructured data. For applications that have an inherent structure (e.g., social networks, proteins), graph mining is useful since mapping the structured data into a transactional representation will lead to loss of information. Graph mining is used for identifying interesting or frequent subgraphs. Database mining uses SQL and relational representation to overcome limitations of main memory algorithms and to achieve scalability. This paper presents a scalable, SQL-based approach to graph mining --- specifically, interesting substructure discovery. The most general form of graphs including directed edges, multiple edges between nodes, and cycles are handled by our approach. Our primary goal in this work has been to address scalability, and map difficult and computationally expensive problems such as pseudo duplicate elimination, canonical labeling, and isomorphism checking into SQL-based counterparts. The notion of minimum description length (MDL) has been cast into corresponding metric for relational representation. Our experimental analysis shows that graphs with Millions of nodes and edges can be handled by the algorithm and the approach presented in this paper.