FOGGER: an algorithm for graph generator discovery

Authors:
Zhiping Zeng;Jianyong Wang;Jun Zhang;Lizhu Zhou
Affiliations:
Tsinghua University, Beijing, P.R.China;Tsinghua University, Beijing, P.R.China;Tsinghua University, Beijing, P.R.China;Tsinghua University, Beijing, P.R.China
Venue:
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Year:
2009

Citing 25
Cited 1

An introduction to Kolmogorov complexity and its applications (2nd ed.)

An introduction to Kolmogorov complexity and its applications (2nd ed.)
Applying MDL to learn best model granularity

Artificial Intelligence
Formal Concept Analysis: Mathematical Foundations

Formal Concept Analysis: Mathematical Foundations
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Scalable mining of large disk-based graph databases

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
SPIN: mining maximal frequent subgraphs from graph databases

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Advances in Minimum Description Length: Theory and Applications (Neural Information Processing)

Advances in Minimum Description Length: Theory and Applications (Neural Information Processing)
Mining closed relational graphs with connectivity constraints

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining coherent dense subgraphs across massive biological networks for functional discovery

Bioinformatics
Finding Frequent Patterns in a Large Sparse Graph*

Data Mining and Knowledge Discovery
CLAN: An Algorithm for Mining Closed Cliques from Large Dense Graph Databases

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
A Partition-Based Approach to Graph Mining

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Coherent closed quasi-clique discovery from large dense graph databases

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Maximal Quasi-Bicliques to Co-Cluster Stocks and Financial Ratios for Value Investment

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
MARGIN: Maximal Frequent Subgraph Mining

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Fg-index: towards verification-free query processing on graph databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
GraphScope: parameter-free mining of large time-evolving graphs

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast direction-aware proximity for graph mining

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Graph indexing: tree + delta

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Minimum description length principle: generators are preferable to closed patterns

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1

Key roles of closed sets and minimal generators in concise representations of frequent patterns

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

To our best knowledge, all existing graph pattern mining algorithms can only mine either closed, maximal or the complete set of frequent subgraphs instead of graph generators which are preferable to the closed subgraphs according to the Minimum Description Length principle in some applications. In this paper, we study a new problem of frequent subgraph mining, called frequent connected graph generator mining, which poses significant challenges due to the underlying complexity associated with frequent subgraph mining as well as the absence of Apriori property for graph generators. Whereas, we still present an efficient solution FOGGER for this new problem. By exploring some properties of graph generators, two effective pruning techniques, backward edge pruning and forward edge pruning, are proposed to prune the branches of the well-known DFS code enumeration tree that do not contain graph generators. To further improve the efficiency, an effective index structure, ADI++, is also devised to facilitate the subgraph isomorphism checking. We experimentally evaluate various aspects of FOGGER using both real and synthetic datasets. Our results demonstrate that the two pruning techniques are effective in pruning the unpromising parts of search space, and FOGGER is efficient and scalable in terms of the base size of input databases. Meanwhile, the performance study for graph generator-based classification model shows that generator-based model is much simpler and can achieve almost the same accuracy for classifying chemical compounds in comparison with closed subgraph-based model.