Matching XML documents in highly dynamic applications

  • Authors:
  • Adrovane M. Kade;Carlos A. Heuser

  • Affiliations:
  • Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil;Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil

  • Venue:
  • Proceedings of the eighth ACM symposium on Document engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.02

Visualization

Abstract

Highly dynamic applications like the Web and peer-to-peer systems require a great deal of effort in document management. Documents from different sources may contain parts that, although having different structure or different contents, may be considered as representing the same conceptual information. One essential task in this scenario is the identification of complementary or overlapping documents that need to be integrated. In this paper, we deal specifically with documents represented in the XML format. XML document integration is an important process in highly dynamic applications, for the volume of data available in this format is constantly growing. XML integration is also a challenging task, due to the flexible nature of XML, which may lead to structure divergences and content conflicts between the documents. In this work, we present a novel approach to the matching problem, i.e., the problem of defining which parts of two documents contain the same information. Matching is usually the first step of an integration process. Our approach is novel in the sense it combines similarity information from the content of the elements with information from the structure of the documents. This feature, as our experiments confirm, makes our approach capable of dealing with content as well as structural divergences.