Preparations for Semantics-Based XML Mining
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
DTD-Miner: A Tool for Mining DTD from XML Documents
WECWIS '00 Proceedings of the Second International Workshop on Advance Issues of E-Commerce and Web-Based Information Systems (WECWIS 2000)
XML Document Clustering Using Common XPath
WIRI '05 Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration
A methodology for clustering XML documents by structure
Information Systems
Similarity Evaluation of XML Documents Based on Weighted Element Tree Model
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Structural similarity evaluation of XML documents based on basic statistics
WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
Hi-index | 0.00 |
Measuring the similarity between XML documents is the fundamental task of finding clusters in XML documents collection. In this paper, XML document is modeled as XML Element Sequence Pattern (XESP) and XESP can be extracted using less time and space than extracing other models such as tree model and frequent paths model. Similarity between XML documents will be measured based on XESPs. In view of the deficiencies encountered by ignoring the hierarchical information in frequent paths pattern models and semantic information in tree models, semantics of the elements and the hierarchical structure of the document will be taken into account when computing the similarity between XML documents by XESPs. Experimental results show that perfect clustering will be obtained with proper threshold of similarity computed by XESPs.