Implementing agglomerative hierarchic clustering algorithms for use in document retrieval
Information Processing and Management: an International Journal
IEEE Transactions on Pattern Analysis and Machine Intelligence
Grouper: a dynamic clustering interface to Web search results
WWW '99 Proceedings of the eighth international conference on World Wide Web
Average complexity of exact and approximate multiple string matching
Theoretical Computer Science
Hi-index | 0.00 |
In this paper, we present a No-Word-Segmentation Hierarchical Clustering Approach (NWSHCA) to Chinese Web search results. The approach uses a new similarity measure between two documents based on a variation of the Edit Distance, and then it generates preliminary clusters using a partitioning clustering method. Next it ranks all common substring in a cluster using a cluster-discriminative metric with the top K as cluster description labels. Finally it uses HAC to cluster the top K cluster labels to form a navigational tree. NWSHCA can generate overlapping clusters contrast to most clustering algorithms. Experimental results show that the approach is feasible and effective.