Efficient algorithms based on relational queries to mine frequent graphs

Authors:
Walter Garcia;Carlos Ordonez;Kai Zhao;Ping Chen
Affiliations:
University of Houston - Downtown, Houston, TX, USA;University of Houston, Houston, TX, USA;University of Houston, Houston, TX, USA;University of Houston - Downtown, Houston, TX, USA
Venue:
PIKM '10 Proceedings of the 3rd workshop on Ph.D. students in information and knowledge management
Year:
2010

Citing 13
Cited 2

Computers and Intractability; A Guide to the Theory of NP-Completeness

Computers and Intractability; A Guide to the Theory of NP-Completeness
Discovery of frequent DATALOG patterns

Data Mining and Knowledge Discovery
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Mining significant graph patterns by leap search

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Bounding and Estimating Association Rule Support from Clusters on Binary Data

ICDMW '08 Proceedings of the 2008 IEEE International Conference on Data Mining Workshops
Models for association rules based on clustering and correlation

Intelligent Data Analysis
Fast and dynamic OLAP exploration using UDFs

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Substructure discovery using minimum description length and background knowledge

Journal of Artificial Intelligence Research
Evaluating statistical tests on OLAP cubes to compare degree of disease

IEEE Transactions on Information Technology in Biomedicine - Special section on computational intelligence in medical systems
Enhanced DB-Subdue: supporting subtle aspects of graph mining using a relational approach

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

PIKM 2010: ACM workshop for ph.d. students in information and knowledge management

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Emerging multidisciplinary research across database management systems

ACM SIGMOD Record

Quantified Score

Hi-index	0.00

Visualization

Abstract

Frequent subgraph mining is an important problem in data mining with wide application in science. For instance, graphs can be used to represent structural relationships in problems related to network topology, chemical compound, protein structures, and so on. Searching for patterns from graph databases is difficult since graph-related operations generally have higher time complexity than equivalent operations on frequent itemsets. From a practical standpoint, databases keep growing with lots of opportunities and need to mine graphs. Even though there is a significant body of work on graph mining, most techniques work outside the database system. Programming frequent graph mining in SQL is more difficult than traditional approaches because the graph must be represented as a table and algorithmic steps must be written as relational queries. In our research, we study three fundamental problems under a database approach: graph storage and indexing, frequent subgraph search, and identifying subgraph isomorphism. We outline main research issues and our solution towards solving them. We also present preliminary experimental validation focusing on query optimizations and time complexity.