The q-gram distance for ordered unlabeled trees

  • Authors:
  • Nobuhito Ohkura;Kouichi Hirata;Tetsuji Kuboyama;Masateru Harao

  • Affiliations:
  • Graduate School of Computer Science and Systems Engineering, Kyushu Institute of Technology, Japan;Department of Artificial Intelligence, Kyushu Institute of Technology, Japan;Center for Collaborative Research, University of Tokyo, Japan;Department of Artificial Intelligence, Kyushu Institute of Technology, Japan

  • Venue:
  • DS'05 Proceedings of the 8th international conference on Discovery Science
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we investigate the q-gram distance for ordered unlabeled trees (trees, for short). First, we formulate a q-gram as simply a tree with q nodes isomorphic to a line graph, and the q-gram distance between two trees as similar as one between two strings. Then, by using the depth sequence based on postorder, we design the algorithm EnumGram to enumerate all q-grams in a tree T with n nodes which runs in O(n2) time and in O(q) space. Furthermore, we improve it to the algorithm LinearEnumGram which runs in O(qn) time and in O(qd) space, where d is the depth of T. Hence, we can evaluate the q-gram distance Dq(T1,T2) between T1 and T2 in O(q maxn1, n2) time and in O(q maxd1, d2) space, where ni and di are the number of nodes in Ti and the depth of Ti, respectively. Finally, we show the relationship between the q-gram distance Dq(T1,T2) and the edit distanceE(T1,T2) that Dq(T1,T2)≤ (gl+1)E(T1,T2), where g=max{g1, g2}, l=max{l1, l2}, gi is the degree of Ti and li is the number of leaves in Ti. In particular, for the top-down tree edit distanceF(T1,T2), this result implies that $D_{q}(T_{1}, T_{2}) \leq {\rm min}\{g^{q-2}, l - 1\}\{F(T_{1}, T_{2})\}$.