An Efficient Algorithm to Compute Differences between Structured Documents

Authors:
Kyong-Ho Lee;Yoon-Chul Choy;Sung-Bae Cho
Affiliations:
IEEE Computer Society;IEEE Computer Society;IEEE Computer Society
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2004

Citing 19
Cited 11

Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
Fast algorithms for the unit cost editing distance between trees

Journal of Algorithms
An O(NP) sequence comparison algorithm

Information Processing Letters
The SGML handbook

The SGML handbook
Structural and cognitive problems in providing version control for hypertext

ECHT '92 Proceedings of the ACM conference on Hypertext
CoVer: a contextual version server for hypertext applications

ECHT '92 Proceedings of the ACM conference on Hypertext
Change detection in hierarchically structured information

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Meaningful change detection in structured data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Structural matching and discovery in document databases

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The XML handbook

The XML handbook
The String-to-String Correction Problem

Journal of the ACM (JACM)
Author's Guide to the Standard Generalized Markup Language

Author's Guide to the Standard Generalized Markup Language
Active Database Systems: Triggers and Rules for Advanced Database Processing

Active Database Systems: Triggers and Rules for Advanced Database Processing
On Implementing a Language for Specifying Active Database Execution Models

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Efficient Snapshot Differential Algorithms for Data Warehousing

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A graphical environment for change detection in structured documents

COMPSAC '97 Proceedings of the 21st International Computer Software and Applications Conference
On the complexity of the Extended String-to-String Correction Problem

STOC '75 Proceedings of seventh annual ACM symposium on Theory of computing
Extending a Structured Document Model with Version Control

IDEAS '98 Proceedings of the 1998 International Symposium on Database Engineering & Applications
Detecting Changes in XML Documents

ICDE '02 Proceedings of the 18th International Conference on Data Engineering

Approximate matching of hierarchical data using pq-grams

VLDB '05 Proceedings of the 31st international conference on Very large data bases
An incrementally maintainable index for approximate lookups in hierarchical data

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A heuristic algorithm for clustering rooted ordered trees

Intelligent Data Analysis
The pq-gram distance between ordered labeled trees

ACM Transactions on Database Systems (TODS)
Extracting prehistories of software refactorings from version archives

LKR'08 Proceedings of the 3rd international conference on Large-scale knowledge resources: construction and application
XML: some papers in a haystack

ACM SIGMOD Record
pq-hash: an efficient method for approximate XML joins

WAIM'10 Proceedings of the 2010 international conference on Web-age information management
Version-aware XML documents

Proceedings of the 11th ACM symposium on Document engineering
RTED: a robust algorithm for the tree edit distance

Proceedings of the VLDB Endowment
S2MP: similarity measure for sequential patterns

AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
RWS-Diff: flexible and efficient change detection in hierarchical data

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

SGML/XML are having a profound impact on data modeling and processing. This paper presents an efficient algorithm to compute differences between old and new versions of an SGML/XML document. The difference between the two versions can be considered to be an edit script that transforms one document tree into another. The proposed algorithm is based on a hybridization of bottom-up and top-down methods: The matching relationships between nodes in the two versions are produced in a bottom-up manner and then the top-down breadth-first search computes an edit script. Faster matching is achieved because the algorithm does not need to investigate the possible existence of matchings for all nodes. Furthermore, it can detect structurally meaningful changes such as the movement and copy of a subtree as well as simple changes to the node itself like insertion, deletion, and update.