Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Optimized Substructure Discovery for Semi-structured Data
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Efficiently mining frequent trees in a forest
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Online Algorithms for Mining Semi-structured Data Stream
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
IncSpan: incremental mining of sequential patterns in large database
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
DRYADE: A New Approach for Discovering Closed Frequent Trees in Heterogeneous Tree Databases
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Frequent Subtree Mining - An Overview
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Mining adaptively frequent closed unlabeled rooted trees in data streams
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
We study the problem of mining closed frequent subtrees from tree databases that are updated regularly over time. Closed frequent subtrees provide condensed and complete information for all frequent subtrees in the database. Although mining closed frequent subtrees is in general faster than mining all frequent subtrees, this is still a very time consuming process, and thus it is undesirable to mine from scratch when the change to the database is small. The set of previous mined closed subtrees should be reused as much as possible to compute new emerging subtrees. We propose, in this paper, a novel and efficient incremental mining algorithm for closed frequent labeled ordered trees. We adopt a divide-and-conquer strategy and apply different mining techniques in different parts of the mining process. The proposed algorithm requires no additional scan of the whole database while its memory usage is reasonable. Our experimental study on both synthetic and real-life datasets demonstrates the efficiency and scalability of our algorithm.