An Experimental Comparison of Different Inclusion Relations in Frequent Tree Mining

Authors:
Jeroen De Knijf;Ad Feelders
Affiliations:
(Correspd.) Algorithmic Data Analysis Group, Department of Information and Computing Sciences, Universiteit Utrecht, PO Box 80.089, 3508 TB Utrecht, The Netherlands. Jeroen.DeKnijf@ua.ac.be/ ad@cs ...;Algorithmic Data Analysis Group, Department of Information and Computing Sciences, Universiteit Utrecht, PO Box 80.089, 3508 TB Utrecht, The Netherlands. Jeroen.DeKnijf@ua.ac.be/ ad@cs.uu.nl
Venue:
Fundamenta Informaticae - Progress on Multi-Relational Data Mining
Year:
2009

Citing 19
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems

Machine Learning
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
TreeFinder: a First Step towards XML Data Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
XRules: an effective structural classifier for XML data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Mining of High Branching Factor Attribute Trees

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
The Wikipedia XML corpus

ACM SIGIR Forum
Efficiently Mining Frequent Embedded Unordered Trees

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Constructing a Decision Tree for Graph-Structured Data and its Applications

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
FAT-miner: mining frequent attribute trees

Proceedings of the 2007 ACM symposium on Applied computing
Subtree Testing and Closed Tree Mining Through Natural Representations

DEXA '07 Proceedings of the 18th International Conference on Database and Expert Systems Applications
Don't be afraid of simpler patterns

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
IMB3-Miner: mining induced/embedded subtrees by constraining the level of embedding

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Tree2: decision trees for tree structured data

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years a variety of mining algorithms, to derive all frequent subtrees from a database of labeled ordered rooted trees has been developed. These algorithms share properties such as enumeration strategies and pruning techniques. They differ however in the tree inclusion relation used and the way attribute values are dealt with. In this work we investigate the different approaches with respect to 'usefulness' of the derived patterns, in particular, the performance of classifiers that use the derived patterns as features. In order to find a good trade-off between expressiveness and runtime performance of the different approaches, we also take the complexity of the different classifiers into account, as well as the run time and memory usage of the different approaches. The experiments are performed on two real data sets, and two synthetic data sets. The results show that significant improvement in both predictive performance and computational efficiency can be gained by choosing the right tree mining approach.