On the editing distance between unordered labeled trees
Information Processing Letters
A semi-structured document model for text mining
Journal of Computer Science and Technology
A bag of paths model for measuring structural similarity in Web documents
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating Element and Term Semantics for Similarity-Based XML Document Clustering
WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
XML Document Clustering Using Common XPath
WIRI '05 Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration
Semantic Structural Similarity for Clustering XML Documents
ICHIT '08 Proceedings of the 2008 International Conference on Convergence and Hybrid Information Technology
Clustering XML documents based on structural similarity
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Clustering XML documents using structural summaries
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
FXProj: a fuzzy XML documents projected clustering based on structure and content
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Hi-index | 0.00 |
XML has been extensively used in many information retrieval related applications. As an important data mining technique, clustering has been used to analyze XML data. The key issue of XML clustering is how to measure the similarity between XML documents. Traditionally, document clustering methods use the content information to measure the document similarity, the structural information contained in XML documents is ignored. In this paper, we propose a model called Structure and Content Vector Model (SCVM) to represent the structure and content information in XML documents. Based on the model, we define similarity measure that can be used to cluster XML documents. Our experimental results show that the proposed model and similarity measure are effective in identifying similar documents when the structure information contained in XML documents is meaningful. This method can be used to improve the precision and efficiency in XML information retrieval.