Bottom-up discovery of frequent rooted unordered subtrees

  • Authors:
  • Yijun Bei;Gang Chen;Lidan Shou;Xiaoyan Li;Jinxiang Dong

  • Affiliations:
  • College of Computer Science, Zhejiang University, Yuquan Campus, Hangzhou 310027, China;College of Computer Science, Zhejiang University, Yuquan Campus, Hangzhou 310027, China;College of Computer Science, Zhejiang University, Yuquan Campus, Hangzhou 310027, China;College of Computer Science, Zhejiang University, Yuquan Campus, Hangzhou 310027, China;College of Computer Science, Zhejiang University, Yuquan Campus, Hangzhou 310027, China

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2009

Quantified Score

Hi-index 0.07

Visualization

Abstract

In the past decade, XML has emerged as the standard language for information exchanging over the Internet. Due to its tree-structure paradigm, XML is superior for its capability of storing, querying, and manipulating complex data. Therefore, discovering frequent tree patterns over tree-structured data has become an interesting topic for XML data management. In this paper, we propose a tree mining algorithm, named BUXMiner, for finding a special class of frequent trees, called rooted unordered trees, from a tree-structured database. BUXMiner employs an efficient bottom-up approach to enumerate all candidate trees over a compact global tree guide and computes the frequent trees based on the tree guide. In addition to BUXMiner, we also propose a mining approach called BUMXMiner to discover the maximal frequent rooted unordered trees. We compare BUXMiner with previous tree-structure mining algorithms, namely XQPMinerTID and FastXMiner, which were also proposed to discover rooted unordered trees. The experimental results show that our algorithm outperforms XQPMinerTID and FastXMiner in terms of efficiency. The performance results from real-world applications also indicate the usefulness of our proposed tree mining algorithms in a variety of web applications, such as analysis of web page access patterns and mining frequent XML query patterns for caching.