Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
Some MAX SNP-hard results concerning unordered labeled trees
Information Processing Letters
Change detection in hierarchically structured information
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Meaningful change detection in structured data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Representing and Querying Changes in Semistructured Data
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Comparing Hierarchical Data in External Memory
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
KF-Diff+: Highly Efficient Change Detection Algorithm for XML Documents
On the Move to Meaningful Internet Systems, 2002 - DOA/CoopIS/ODBASE 2002 Confederated International Conferences DOA, CoopIS and ODBASE 2002
Edit Distance with Move Operations
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Computing the Edit-Distance between Unrooted Ordered Trees
ESA '98 Proceedings of the 6th Annual European Symposium on Algorithms
Shape Indexing Using Approximate Nearest-Neighbour Search in High-Dimensional Spaces
CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Object Recognition from Local Scale-Invariant Features
ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Detecting Changes in XML Documents
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Xyleme: A Dynamic Warehouse for XML Data of the Web
IDEAS '01 Proceedings of the 2001 International Symposium on Database Engineering & Applications
XML three-way merge as a reconciliation engine for mobile data
Proceedings of the 3rd ACM international workshop on Data engineering for wireless and mobile access
An Efficient Algorithm to Compute Differences between Structured Documents
IEEE Transactions on Knowledge and Data Engineering
A three-way merge for XML documents
Proceedings of the 2004 ACM symposium on Document engineering
BioDIFF: an effective fast change detection algorithm for genomic and proteomic data
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Similarity evaluation on tree-structured data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Scalable Recognition with a Vocabulary Tree
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Learning Metrics Between Tree Structured Data: Application to Image Recognition
ECML '07 Proceedings of the 18th European conference on Machine Learning
A Comparative Evaluation of XML Difference Algorithms with Genomic Data
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Approximate Joins for Data-Centric XML
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
An optimal decomposition algorithm for tree edit distance
ACM Transactions on Algorithms (TALG)
The pq-gram distance between ordered labeled trees
ACM Transactions on Database Systems (TODS)
Analysis of tree edit distance algorithms
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Locality sensitive hashing: A comparison of hash function types and querying mechanisms
Pattern Recognition Letters
RTED: a robust algorithm for the tree edit distance
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
The problem of generating a cost-minimal edit script between two trees has many important applications. However, finding such a cost-minimal script is computationally hard, thus the only methods that scale are approximate ones. Various approximate solutions have been proposed recently. However, most of them still show quadratic or worse runtime complexity in the tree size and thus do not scale well either. The only solutions with log-linear runtime complexity use simple matching algorithms that only find corresponding subtrees as long as these subtrees are equal. Consequently, such solutions are not robust at all, since small changes in the leaves which occur frequently can make all subtrees that contain the changed leaves unequal and thus prevent the matching of large portions of the trees. This problem could be avoided by searching for similar instead of equal subtrees but current similarity approaches are too costly and thus also show quadratic complexity. Hence, currently no robust log-linear method exists. We propose the random walks similarity (RWS) measure which can be used to find similar subtrees rapidly. We use this measure to build the RWS-Diff algorithm that is able to compute an approximately cost-minimal edit script in log-linear time while having the robustness of a similarity-based approach. Our evaluation reveals that random walk similarity indeed increases edit script quality and robustness drastically while still maintaining a runtime comparable to simple matching approaches.