XML-SIM-CHANGE: structure and content semantic similarity detection among XML document versions

  • Authors:
  • Waraporn Viyanon;Sanjay K. Madria

  • Affiliations:
  • Department of Computer Science, Missouri University of Science and Technology, Rolla, Missouri;Department of Computer Science, Missouri University of Science and Technology, Rolla, Missouri

  • Venue:
  • OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems: Part II
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML documents from different sources may represent the same or similar information with respect to content and structure. Being able to integrate similar XML documents is important to query systems and search engines. However, information changes periodically, therefore, it is important to detect the changes among different versions of an XML document and use the changed information to discover semantic similarity among XML documents. In this paper, we introduce such an approach to detect XML similarity using the change detection mechanism to join XML document versions. In our approach, keys in subtrees play an important role in order to avoid unnecessary comparisons of subtrees within different XML versions of the same document. We use relational database to store XML versions and apply SQL for detecting similarities. We show that our approach is highly scalable and has better efficiency in terms of execution time and provides comparable result quality.