XML document similarity measure in terms of the structure and contents

  • Authors:
  • Woosaeng Kim

  • Affiliations:
  • Department of Computer Science, Kwangwoon University, Nowon-Gu, Seoul, Korea

  • Venue:
  • CEA'08 Proceedings of the 2nd WSEAS International Conference on Computer Engineering and Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML has become the standard for data representation and exchange on the Internet. With a large number of XML documents on the Web, there is an increasing need to automatically process those structurally rich documents for information retrieval, similarity clustering, and search applications. In this paper, we propose a new method to measure the similarity between XML documents by considering their structures and contents. The similarity of document's structure is found by simple string matching technique and that of document's contents is found by weights taking into account of the names and positions of elements.