Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
XClust: clustering XML schemas for effective integration
Proceedings of the eleventh international conference on Information and knowledge management
Automatic web news extraction using tree edit distance
Proceedings of the 13th international conference on World Wide Web
Clustering XML documents using structural summaries
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Hi-index | 0.01 |
Evaluating Similar structure of XML is a key issue for building the core algorithms for XML document clustering, XML classification and the extraction of schema or DTD from a corpus of XML documents. This evaluation is based on the structural similarity between XML documents. This work employs Bloom filter to represent an XML document with two structures: one is Tag-based Bloom filter (TBF) which describes an XML document with the tags of elements, and the other is Path-based Bloom filter (PBF) which describes hierarchical structure of the XML document. Based on this two structures, an approach is developed to evaluate the similarity of XML documents. A group of experiments was conducted to investigate the performance of the proposed approach.