Element matching across data-oriented XML sources using a multi-strategy clustering model

  • Authors:
  • Charnyote Pluempitiwiriyawej;Joachim Hammer

  • Affiliations:
  • Department of Computer Science, Mahidol University, Rama VI Rd., Bangkok 10400, Thailand;Department of Computer and Information Science and Engineering, University of Florida, Box 116120, 301 CSE Building, Gainesville, FL

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a family of heuristics-based clustering strategies to support the merging of XML data from multiple sources. As part of this research, we have developed a comprehensive classification for schematic and semantic conflicts that can occur when reconciling related XML data from multiple sources. Given the fact that element clustering is compute-intensive, especially when comparing large numbers of data elements that exhibit great representational diversity, performance is a critical, yet so far neglected aspect of the merging process. We have developed five heuristics for clustering data in the multi-dimensional metric space. Equivalence of data elements within the individual clusters is determined using several distance functions that calculate the semantic distances among the elements.The research described in this article is conducted within the context of the Integration Wizard (IWIZ) project at the University of Florida. IWIZ enables users to access and retrieve information from multiple XML-based sources through a consistent, integrated view. The results of our qualitative analysis of the clustering heuristics have validated the feasibility of our approach as well as its superior performance when compared to other similarity search techniques.