A Comparative Evaluation of XML Difference Algorithms with Genomic Data

  • Authors:
  • Cornelia Hedeler;Norman W. Paton

  • Affiliations:
  • School of Computer Science, The University of Manchester, Manchester, UK M13 9PL;School of Computer Science, The University of Manchester, Manchester, UK M13 9PL

  • Venue:
  • SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
  • Year:
  • 2008

Quantified Score

Hi-index 0.02

Visualization

Abstract

Genome sequence data and annotations are subject to frequent changes resulting from re-assembly and re-annotation, or community feedback based on experimental evidence, giving rise to new data releases. These releases are rarely accompanied by a description of the changes, making it difficult for biologists working with the data to identify and work through the consequences of the changes that have taken place. This paper explores the extent to which existing XML difference algorithms, namely X-Diff, JXyDiff and 3DM, can be used to identify and document genome changes, in particular investigating: (i) their ability to detect typical changes in genome sequence documents; and (ii) the ease with which the difference report can be used to determine whether genes of interest are affected by changes to the genome. The evaluation compares the performance of the algorithms both with synthetic modifications and for detecting changes in a public genomic database. Typical behaviours of the algorithms are identified and a root cause analysis carried out.