An efficient algorithm for mining both closed and maximal frequent free subtrees using canonical forms

  • Authors:
  • Ping Guo;Yang Zhou;Jun Zhuang;Ting Chen;Yan-Rong Kang

  • Affiliations:
  • School of Computer Science, Chongqing University, Chongqing, China;School of Computer Science, Chongqing University, Chongqing, China;School of Computer Science, Chongqing University, Chongqing, China;School of Computer Science, Chongqing University, Chongqing, China;School of Computer Science, Chongqing University, Chongqing, China

  • Venue:
  • ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

A large number of text files, including HTML documents and XML documents, can be organized as tree structures. One objective of data mining is to discover frequent patterns in them. In this paper, first, we introduce a canonical form of free tree, which is based on the breadth-first canonical string; secondly, we present some properties of a closed frequent subtree and a maximal frequent subtree as well as their relationships; thirdly, we study a pruning technique of frequent free subtree and improvement on the mining of the nonclosed frequent free subtree; finally, we present an algorithm that mines all closed and maximal frequent free trees and prove validity of this algorithm.