Full duplicate candidate pruning for frequent connected subgraph mining

Authors:
André/s Gago-Alonso;Jesú/s A. Carrasco-Ochoa;José/ E. Medina-Pagola;José/ Fco. Martí/nez-Trinidad
Affiliations:
Advanced Technologies Application Center, Havana, Cuba;(Correspd. Tel.: +52 (222) 266 31 00, ext 8311/ Fax: +52 (222) 266 34 52/ E-mail: ariel@inaoep.mx) Computer Science Department, National Institute of Astrophysics, Optics and Electronics, Puebla, ...;Advanced Technologies Application Center, Havana, Cuba;Computer Science Department, National Institute of Astrophysics, Optics and Electronics, Puebla, Mé/xico
Venue:
Integrated Computer-Aided Engineering
Year:
2010

Citing 21
Cited 2

Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Molecular Fragments: Finding Relevant Substructures of Molecules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
TreeFinder: a First Step towards XML Data Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A quickstart in frequent structure mining can make a difference

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
An efficient algorithm for detecting frequent subgraphs in biological networks

Bioinformatics
Frequent pattern mining: current status and future directions

Data Mining and Knowledge Discovery
GDClust: A Graph-Based Document Clustering Technique

ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Direct mining of discriminative and essential frequent patterns via model-based search tree

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A knowledge retrieval model using ontology mining and user profiling

Integrated Computer-Aided Engineering
Utilizing phrase-similarity measures for detecting and clustering informative RSS news articles

Integrated Computer-Aided Engineering
Ontology-based inference for causal explanation

Integrated Computer-Aided Engineering
Mining Frequent Connected Subgraphs Reducing the Number of Candidates

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Efficient mining of interesting weighted patterns from directed graph traversals

Integrated Computer-Aided Engineering
The predictive toxicology evaluation challenge

IJCAI'97 Proceedings of the 15th international joint conference on Artifical intelligence - Volume 1
Duplicate candidate elimination and fast support calculation for frequent subgraph mining

IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning
A quantitative comparison of the subgraph miners mofa, gspan, FFSM, and gaston

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

A new proposal for graph-based image classification using frequent approximate subgraphs

Pattern Recognition
A new proposal for graph classification using frequent geometric subgraphs

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Support calculation and duplicate detection are the most challenging and unavoidable subtasks in frequent connected subgraph (FCS) mining. The most successful FCS mining algorithms have focused on optimizing these subtasks since the existing solutions for both subtasks have high computational complexity. In this paper, we propose two novel properties that allow removing all duplicate candidates before support calculation. Besides, we introduce a fast support calculation strategy based on embedding structures. Both properties and the new embedding structure are used for designing two new algorithms: gdFil for mining all FCSs; and gdClosed for mining all closed FCSs. The experimental results show that our proposed algorithms get the best performance in comparison with other well known algorithms.