Tree mining: Equivalence classes for candidate generation

Authors:
F. Del Razo López;A. Laurent;M. Teisseire;P. Poncelet
Affiliations:
Toluca Institute of Technology - ITT, Av. Instituto Tecnológico S/N - Col. Ex-Rancho La Virgen, Metepec, Edo. de México C.P. 52140, México;University Montpellier 2 - LIRMM, 161, rue Ada, Montpellier, France. E-mail: {laurent,teisseire,poncelet}@lirmm.fr;University Montpellier 2 - LIRMM, 161, rue Ada, Montpellier, France. E-mail: {laurent,teisseire,poncelet}@lirmm.fr;University Montpellier 2 - LIRMM, 161, rue Ada, Montpellier, France. E-mail: {laurent,teisseire,poncelet}@lirmm.fr
Venue:
Intelligent Data Analysis
Year:
2009

Citing 9
Cited 0

Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
TreeFinder: a First Step towards XML Data Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees

IEEE Transactions on Knowledge and Data Engineering
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
TRIPS and TIDES: new algorithms for tree mining

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Efficiently Mining Frequent Embedded Unordered Trees

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the rise of active research fields such as bioinformatics, taxonomies and the growing use of XML documents, tree data are playing a more and more important role. Mining for frequent subtrees from these data is thus an active research problem and traditional methods (e.g., itemset mining from transactional databases) have to be extended in order to tackle the problem of handling tree-based data. Some approaches have been proposed in the literature, mainly based on generate-and-prune methods. However, they generate a large volume of candidates before pruning them, whereas it could be possible to discard some solutions as they contain unfrequent subtrees. We thus propose a novel approach, called pivot, based on equivalence classes in order to decrease the number of candidates. Three equivalence classes are defined, the first one relying on a right equivalence relation between two trees, the second one on a left equivalence relation, and the third one on the ground of a root equivalence relation. In this paper, we introduce this new method, showing that it is complete (i.e., no frequent subtree is forgotten), and efficient, as illustrated by the experiments led on synthetic and real datasets.