Structural similarity evaluation between XML documents and DTDs

  • Authors:
  • Joe Tekli;Richard Chbeir;Kokou Yetongnon

  • Affiliations:
  • LE2I Laboratory, UMR-CNRS, University of Bourgogne, Dijon Cedex, France;LE2I Laboratory, UMR-CNRS, University of Bourgogne, Dijon Cedex, France;LE2I Laboratory, UMR-CNRS, University of Bourgogne, Dijon Cedex, France

  • Venue:
  • WISE'07 Proceedings of the 8th international conference on Web information systems engineering
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The automatic processing and management of XML-based data are ever more popular research issues due to the increasing abundant use of XML, especially on the Web. Nonetheless, several operations based on the structure of XML data have not yet received strong attention. Among these is the process of matching XML documents with XML grammars, useful in various applications such as documents classification, retrieval and selective dissemination of information. In this paper, we propose an algorithm for measuring the structural similarity between an XML document and a Document Type Definition (DTD) considered as the simplest way for specifying structural constraints on XML documents. We consider the various DTD operators that designate constraints on the existence, repeatability and alternativeness of XML elements/attributes. Our approach is based on the concept of tree edit distance, as an effective and efficient means for comparing tree structures, XML documents and DTDs being modeled as ordered labeled trees. It is of polynomial complexity, in comparison with existing exponential algorithms. Classification experiments, conducted on large sets of real and synthetic XML documents, underline our approach effectiveness, as well as its applicability to large XML repositories and databases.