A polynomial time matching algorithm of structured ordered tree patterns for data mining from semistructured data

  • Authors:
  • Yusuke Suzuki;Kohtaro Inomae;Takayoshi Shoudai;Tetsuhiro Miyahara;Tomoyuki Uchida

  • Affiliations:
  • Department of Informatics, Kyushu University, Kasuga, Japan;Department of Informatics, Kyushu University, Kasuga, Japan;Department of Informatics, Kyushu University, Kasuga, Japan;Faculty of Information Sciences, Hiroshima City University, Hiroshima, Japan;Faculty of Information Sciences, Hiroshima City University, Hiroshima, Japan

  • Venue:
  • ILP'02 Proceedings of the 12th international conference on Inductive logic programming
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Tree structured data such as HTML/XML files are represented by rooted trees with ordered children and edge labels. Knowledge representations for tree structured data are quite important to discover interesting features which such tree structured data have. In this paper, as a representation of structural features we propose a structured ordered tree pattern, called a term tree, which is a rooted tree pattern consisting of ordered children and structured variables. A variable in a term tree can be substituted by an arbitrary tree. Deciding whether or not each given tree structured data has structural features is a core problem for data mining of large tree structured data. We consider a problem of deciding whether or not a term tree t matches a tree T, that is, T is obtained from t by substituting some trees for variables in t. Such a problem is called a membership problem for t and T. Given a term tree t and a tree T, we present an O(nN) time algorithm of solving the membership problem for t and T, where n and N are the numbers of vertices in t and T, respectively. We also report some experiments on applying our matching algorithm to a collection of real Web documents.