Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Approximate matching of hierarchical data using pq-grams
VLDB '05 Proceedings of the 31st international conference on Very large data bases
A survey on tree edit distance and related problems
Theoretical Computer Science
Integrating XML data sources using approximate joins
ACM Transactions on Database Systems (TODS)
The pq-gram distance between ordered labeled trees
ACM Transactions on Database Systems (TODS)
Analysis of tree edit distance algorithms
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Approximate joins for XML using g-string
XSym'10 Proceedings of the 7th international XML database conference on Database and XML technologies
pq-hash: an efficient method for approximate XML joins
WAIM'10 Proceedings of the 2010 international conference on Web-age information management
Hi-index | 0.00 |
Similarity join is applied very widely nowadays since data items representing the same real-world objects may be different due to various conventions. Another reason for similarity join is that the efficiency of traditional methods is really low. Therefore, a method with both high efficiency and high join quality is in need. In the paper, we put forward two new edit operations (reversing and mapping) together with related algorithms concerning similarity join based on the new defined measure. In our method, computing tree edit distance is replaced by computing k-generation set distance between trees. The join process is simplified largely by applying the new method. The time complexity of our method is O(n2), where n is the tree size. We have proved that our method owns some advantages over others. And it can be scaled to large data sets as well.