A Bloom Filter Based Approach for Evaluating Structural Similarity of XML Documents

  • Authors:
  • Dunlu Peng;Huan Hou;Jing Lu

  • Affiliations:
  • School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, China 200093;School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, China 200093;School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, China 200093

  • Venue:
  • WISM '09 Proceedings of the International Conference on Web Information Systems and Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Evaluating Similar structure of XML is a key issue for building the core algorithms for XML document clustering, XML classification and the extraction of schema or DTD from a corpus of XML documents. This evaluation is based on the structural similarity between XML documents. This work employs Bloom filter to represent an XML document with two structures: one is Tag-based Bloom filter (TBF) which describes an XML document with the tags of elements, and the other is Path-based Bloom filter (PBF) which describes hierarchical structure of the XML document. Based on this two structures, an approach is developed to evaluate the similarity of XML documents. A group of experiments was conducted to investigate the performance of the proposed approach.