diffX: an algorithm to detect changes in multi-version XML documents

Authors:
Raihan Al-Ekram;Archana Adma;Olga Baysal
Affiliations:
School of Computer Science, University of Waterloo;School of Computer Science, University of Waterloo;School of Computer Science, University of Waterloo
Venue:
CASCON '05 Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research
Year:
2005

Citing 12
Cited 19

Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
Fast algorithms for the unit cost editing distance between trees

Journal of Algorithms
An O(NP) sequence comparison algorithm

Information Processing Letters
On the editing distance between unordered labeled trees

Information Processing Letters
Change detection in hierarchically structured information

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Meaningful change detection in structured data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The String-to-String Correction Problem

Journal of the ACM (JACM)
The Tree-to-Tree Correction Problem

Journal of the ACM (JACM)
The string-to-string correction problem with block moves

ACM Transactions on Computer Systems (TOCS)
Managing and querying multi-version XML data with update logging

Proceedings of the 2002 ACM symposium on Document engineering
Efficient Snapshot Differential Algorithms for Data Warehousing

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Detecting Changes in XML Documents

ICDE '02 Proceedings of the 18th International Conference on Data Engineering

Modeling miRNA data

Proceedings of the 2007 ACM symposium on Applied computing
A heuristic algorithm for clustering rooted ordered trees

Intelligent Data Analysis
An Effective Data Processing Method for Fast Clustering

ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Automatically finding patches using genetic programming

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Extreme visualisation of query optimizer search space

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A genetic programming approach to automated software repair

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Data Discovery and Related Factors of Documents on the Web and the Network

ICCSA '09 Proceedings of the International Conference on Computational Science and Its Applications: Part I
Automatic program repair with evolutionary computation

Communications of the ACM
Harnessing web-based application similarities to aid in regression testing

ISSRE'09 Proceedings of the 20th IEEE international conference on software reliability engineering
An effective detection method for clustering similar XML DTDs using tag sequences

ICCSA'07 Proceedings of the 2007 international conference on Computational science and Its applications - Volume Part II
Designing better fitness functions for automated program repair

Proceedings of the 12th annual conference on Genetic and evolutionary computation
Modeling consumer-perceived web application fault severities for testing

Proceedings of the 19th international symposium on Software testing and analysis
An automatic HTTP cookie management system

Computer Networks: The International Journal of Computer and Telecommunications Networking
Federated access control and workflow enforcement in systems configuration

LISA'09 Proceedings of the 23rd conference on Large installation system administration
Using versioned tree data structure, change detection and node identity for three-way XML merging

Proceedings of the 10th ACM symposium on Document engineering
Version-aware XML documents

Proceedings of the 11th ACM symposium on Document engineering
Representations and operators for improving evolutionary software repair

Proceedings of the 14th annual conference on Genetic and evolutionary computation
A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each

Proceedings of the 34th International Conference on Software Engineering
Current challenges in automatic software repair

Software Quality Control

Quantified Score

Hi-index	0.02

Visualization

Abstract

This paper presents the diffX algorithm for detecting changes between two versions of an XML document. The identified changes are reported as a script of edit operations. The script, when applied to the first version of the XML document, will produce the second version. The goal is to optimize the runtime of mapping the nodes between the two versions and to minimize the size of the edit script. To achieve this goal an isolated tree fragment mapping technique is used, in order to iteratively identify the largest matching tree fragments between the tree representations, of the two versions of the document. The mapping technique is robust enough to handle differences in both the structure and the content of the two trees. The generated edit script from the mapping acknowledges the different order sensitiveness of element and attributes of XML data model. The primitives for the edit script comprise both the atomic (node) and non-atomic (subtree) edit operations natural to XML document modification. The runtime of the algorithm is O(n2).