Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
Approximate string-matching with q-grams and maximal matches
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Optimal multi-step k-nearest neighbor search
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The Tree-to-Tree Correction Problem
Journal of the ACM (JACM)
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Efficient and tumble similar set retrieval
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Efficient Index Structures for String Databases
Proceedings of the 27th International Conference on Very Large Data Bases
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
Computing the Edit-Distance between Unrooted Ordered Trees
ESA '98 Proceedings of the 6th Annual European Symposium on Algorithms
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient set joins on similarity predicates
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Similarity evaluation on tree-structured data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Approximate matching of hierarchical data using pq-grams
VLDB '05 Proceedings of the 31st international conference on Very large data bases
n-gram/2L: a space and time efficient two-level n-gram inverted index structure
VLDB '05 Proceedings of the 31st international conference on Very large data bases
A survey on tree edit distance and related problems
Theoretical Computer Science
A Primitive Operator for Similarity Joins in Data Cleaning
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Efficient exact set-similarity joins
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Scaling up all pairs similarity search
Proceedings of the 16th international conference on World Wide Web
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Efficient Merging and Filtering Algorithms for Approximate String Searches
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Fast Indexes and Algorithms for Set Similarity Selection Queries
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Hi-index | 0.00 |
Tree-structured data are becoming ubiquitous nowadays and manipulating them based on similarity is essential for many applications. Although similarity search on textual data has been extensively studied, searching for similar trees is still an open problem due to the high complexity of computing the similarity between trees, especially for large numbers of tress. In this paper, we propose to transform tree-structured data into strings with a one-to-one mapping. We prove that the edit distance of the corresponding strings forms a bound for the similarity measures between trees, including tree edit distance, largest common subtrees and smallest common super-trees. Based on the theoretical analysis, we can employ any existing algorithm of approximate string search for effective similarity search on trees. Moreover, we embed the bound into a filter-and-refine framework for facilitating similarity search on tree-structured data. The experimental results show that our algorithm achieves high performance and outperforms state-of-the-art methods significantly. Our method is especially suitable for accelerating similarity query processing on large numbers of trees in massive datasets.