XML Data Integration Based on Content and Structure Similarity Using Keys

  • Authors:
  • Waraporn Viyanon;Sanjay K. Madria;Sourav S. Bhowmick

  • Affiliations:
  • Department of Computer Science, Missouri University of Science and Technology, Rolla, USA;Department of Computer Science, Missouri University of Science and Technology, Rolla, USA;School of Computer Engineering, Nanyang Technological University, Singapore

  • Venue:
  • OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on On the Move to Meaningful Internet Systems:
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a technique for approximately matching XML data based on the content and structure by detecting the similarity of subtrees clustered semantically using leaf-node parents . The leaf-node parents are considered as a root of a subtree which is then recursively traversed bottom-up for matching. First, we take advantage of the "key" for matching subtrees which reduces the number of comparisons dramatically. Second, we measure the similarity degree based on data and structures of the two XML documents. The results show that our approach finds much more accurate matches with or without the presence of keys in XML subtrees. Other approaches experience problems with similarity matching thresholds as they either ignore semantic information available or have problems in handling complex XML data.