Efficient Similarity Search for Tree-Structured Data

  • Authors:
  • Guoliang Li;Xuhui Liu;Jianhua Feng;Lizhu Zhou

  • Affiliations:
  • Department of Computer Science and Technology, Tsinghua University, Beijing, China 100084;Department of Computer Science and Technology, Tsinghua University, Beijing, China 100084;Department of Computer Science and Technology, Tsinghua University, Beijing, China 100084;Department of Computer Science and Technology, Tsinghua University, Beijing, China 100084

  • Venue:
  • SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Tree-structured data are becoming ubiquitous nowadays and manipulating them based on similarity is essential for many applications. Although similarity search on textual data has been extensively studied, searching for similar trees is still an open problem due to the high complexity of computing the similarity between trees, especially for large numbers of tress. In this paper, we propose to transform tree-structured data into strings with a one-to-one mapping. We prove that the edit distance of the corresponding strings forms a bound for the similarity measures between trees, including tree edit distance, largest common subtrees and smallest common super-trees. Based on the theoretical analysis, we can employ any existing algorithm of approximate string search for effective similarity search on trees. Moreover, we embed the bound into a filter-and-refine framework for facilitating similarity search on tree-structured data. The experimental results show that our algorithm achieves high performance and outperforms state-of-the-art methods significantly. Our method is especially suitable for accelerating similarity query processing on large numbers of trees in massive datasets.