Detecting Data and Schema Changes in Scientific Documents

  • Authors:
  • Nabil Adam;Igg Adiwijaya;Terence Critchlow;Ron Musick

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ADL '00 Proceedings of the IEEE Advances in Digital Libraries 2000
  • Year:
  • 2000

Quantified Score

Hi-index 0.02

Visualization

Abstract

Data stored in a data warehouse must be kept consistent and up-to-date with respect to the underlying information sources. By providing the capability to identify, categorize and detect changes in these sources, only the modified data needs to be transfered and entered into the warehouse. Another alternative, periodically reloading from scratch, is obviously inefficient. When the schema of an information source changes, all components that interact with, or make use of, data originating from that source must be updated to conform. The change detection problem is the problem of detecting data and schema changes by comparing two versions of the same semi-structured document. In this paper, we present an approach to detecting data and schema changes for scientific documents. Scientific data is of particular interest because it is normally stored as semi-structured document, and suffers frequent schema updates. This paper demonstrates the use of graph to represent scientific documents in particular and semi-structured documents in general as well as their schema. It also demonstrates an approach to efficiently detect data and schema changes by merging the detection with parsing the document.