Structural similarity between XML documents and DTDs

  • Authors:
  • Patrick K. L. Ng;Vincent T. Y. Ng

  • Affiliations:
  • Department of Computing, the Hong Kong Polytechnic University, Hong Kong;Department of Computing, the Hong Kong Polytechnic University, Hong Kong

  • Venue:
  • ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The use of XML documents in the Internet continues to grow. Need for the analysis of XML documents from heterogeneous sources is arisen, in which documents would conform to different DTDs. In this paper, we propose a measure on the structural similarity among XML documents and DTDs, which is natural to understand and fast to calculate. The measure is defined as a weighted sum of the local measures of document elements with a weighting scheme based on their subtree sizes. While the local measure of an element is defined as its edit distance against its declaration, viewed as regular expression, in the DTD. Based on our definition, an algorithm for edit distance calculation between a string and a regular expression is proposed, which is modified from the algorithm applied in the regular expression matching problem. The advantage of the measure comes with its natural definition and linear complexity.