Duplicate candidate elimination and fast support calculation for frequent subgraph mining

Authors:
Andrés Gago-Alonso;Jesús Ariel Carrasco-Ochoa;José Eladio Medina-Pagola;José Fco. Martínez-Trinidad
Affiliations:
Advanced Technologies Application Center, La Habana, Cuba and National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico;National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico;Advanced Technologies Application Center, La Habana, Cuba;National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico
Venue:
IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning
Year:
2009

Citing 8
Cited 2

Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Mining Molecular Fragments: Finding Relevant Substructures of Molecules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A quickstart in frequent structure mining can make a difference

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent pattern mining: current status and future directions

Data Mining and Knowledge Discovery
GDClust: A Graph-Based Document Clustering Technique

ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
A quantitative comparison of the subgraph miners mofa, gspan, FFSM, and gaston

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

Full duplicate candidate pruning for frequent connected subgraph mining

Integrated Computer-Aided Engineering
Frequent approximate subgraphs as features for graph-based image classification

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Frequent connected subgraph mining (FCSM) is an interesting task with wide applications in real life. Most of the previous studies are focused on pruning search subspaces or optimizing the subgraph isomorphism (SI) tests. In this paper, a new property to remove all duplicate candidates in FCSM during the enumeration is introduced. Based on this property, a new FCSM algorithm called gdFil is proposed. In our proposal, the candidate space does not contain duplicates; therefore, we can use a fast evaluation strategy for reducing the cost of SI tests without wasting memory resources. Thus, we introduce a data structure to reduce the cost of SI tests. The performance of our algorithm is compared against other reported algorithms.