Similarity join on XML based on k-generation set distance

  • Authors:
  • Yue Wang;Hongzhi Wang;Yang Wang;Hong Gao

  • Affiliations:
  • The School of Computer Science and Technology, Harbin Institute of Technology, China;The School of Computer Science and Technology, Harbin Institute of Technology, China;The School of Computer Science and Technology, Harbin Institute of Technology, China;The School of Computer Science and Technology, Harbin Institute of Technology, China

  • Venue:
  • WAIM'11 Proceedings of the 2011 international conference on Web-Age Information Management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Similarity join is applied very widely nowadays since data items representing the same real-world objects may be different due to various conventions. Another reason for similarity join is that the efficiency of traditional methods is really low. Therefore, a method with both high efficiency and high join quality is in need. In the paper, we put forward two new edit operations (reversing and mapping) together with related algorithms concerning similarity join based on the new defined measure. In our method, computing tree edit distance is replaced by computing k-generation set distance between trees. The join process is simplified largely by applying the new method. The time complexity of our method is O(n2), where n is the tree size. We have proved that our method owns some advantages over others. And it can be scaled to large data sets as well.