FOGGER: an algorithm for graph generator discovery

  • Authors:
  • Zhiping Zeng;Jianyong Wang;Jun Zhang;Lizhu Zhou

  • Affiliations:
  • Tsinghua University, Beijing, P.R.China;Tsinghua University, Beijing, P.R.China;Tsinghua University, Beijing, P.R.China;Tsinghua University, Beijing, P.R.China

  • Venue:
  • Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

To our best knowledge, all existing graph pattern mining algorithms can only mine either closed, maximal or the complete set of frequent subgraphs instead of graph generators which are preferable to the closed subgraphs according to the Minimum Description Length principle in some applications. In this paper, we study a new problem of frequent subgraph mining, called frequent connected graph generator mining, which poses significant challenges due to the underlying complexity associated with frequent subgraph mining as well as the absence of Apriori property for graph generators. Whereas, we still present an efficient solution FOGGER for this new problem. By exploring some properties of graph generators, two effective pruning techniques, backward edge pruning and forward edge pruning, are proposed to prune the branches of the well-known DFS code enumeration tree that do not contain graph generators. To further improve the efficiency, an effective index structure, ADI++, is also devised to facilitate the subgraph isomorphism checking. We experimentally evaluate various aspects of FOGGER using both real and synthetic datasets. Our results demonstrate that the two pruning techniques are effective in pruning the unpromising parts of search space, and FOGGER is efficient and scalable in terms of the base size of input databases. Meanwhile, the performance study for graph generator-based classification model shows that generator-based model is much simpler and can achieve almost the same accuracy for classifying chemical compounds in comparison with closed subgraph-based model.