The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval
The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval
Information retrieval
Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Clustering Algorithms and Validity Measures
SSDBM '01 Proceedings of the 13th International Conference on Scientific and Statistical Database Management
Peer-to-peer management of XML data: issues and research challenges
ACM SIGMOD Record
Schema matching for transforming structured documents
Proceedings of the 2005 ACM symposium on Document engineering
XML schema clustering with semantic and hierarchical similarity measures
Knowledge-Based Systems
Hi-index | 0.00 |
The amount of XML documents is increasing rapidly. In order to analyze the information represented in XML documents efficiently, researches on XML document clustering are actively in progress. The key issue is how to devise the similarity measure between XML documents to be used for clustering. Since XML documents have hierarchical structure, it is not appropriate to cluster them by using a general document similarity measure. In this paper, we propose the novel similarity calculation measure by reducing Nesting and repeating in the whole XML document. Then propose an improved Edge-set comparison algorithm to calculate two XML documents' similarity. Our experiments show that the proposed method improves accuracy on the clustering, compared to the previous works.