Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents

  • Authors:
  • Tetsuhiro Miyahara;Takayoshi Shoudai;Tomoyuki Uchida;Kenichi Takahashi;Hiroaki Ueda

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many documents such as Web documents or XML files have no rigid structure. Such semistructured documents have been rapidly increasing. We propose a new method for discovering frequent tree structured patterns in semistructured Web documents. We consider the data mining problem of finding all maximally frequent tag tree patterns in semistructured data such as Web documents. A tag tree pattern is an edge labeled tree which has hyperedges as variables. An edge label is a tag or a keyword inWeb documents, and a variable can be substituted by any tree. So a tag tree pattern is suited for representing tree structured patterns in semistructured Web documents. We present an algorithm for finding all maximally frequent tag tree patterns. Also we report some experimental results on XML documents by using our algorithm.