diffX: an algorithm to detect changes in multi-version XML documents

  • Authors:
  • Raihan Al-Ekram;Archana Adma;Olga Baysal

  • Affiliations:
  • School of Computer Science, University of Waterloo;School of Computer Science, University of Waterloo;School of Computer Science, University of Waterloo

  • Venue:
  • CASCON '05 Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research
  • Year:
  • 2005

Quantified Score

Hi-index 0.02

Visualization

Abstract

This paper presents the diffX algorithm for detecting changes between two versions of an XML document. The identified changes are reported as a script of edit operations. The script, when applied to the first version of the XML document, will produce the second version. The goal is to optimize the runtime of mapping the nodes between the two versions and to minimize the size of the edit script. To achieve this goal an isolated tree fragment mapping technique is used, in order to iteratively identify the largest matching tree fragments between the tree representations, of the two versions of the document. The mapping technique is robust enough to handle differences in both the structure and the content of the two trees. The generated edit script from the mapping acknowledges the different order sensitiveness of element and attributes of XML data model. The primitives for the edit script comprise both the atomic (node) and non-atomic (subtree) edit operations natural to XML document modification. The runtime of the algorithm is O(n2).