A no-word-segmentation hierarchical clustering approach to Chinese web search results

Authors:
Hui Zhang;Liping Zhao;Rui Liu;Deqing Wang
Affiliations:
State Key Lab.Of Software Development Environment, Beihang University;State Key Lab.Of Software Development Environment, Beihang University;State Key Lab.Of Software Development Environment, Beihang University;State Key Lab.Of Software Development Environment, Beihang University
Venue:
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Year:
2008

Citing 4
Cited 0

Implementing agglomerative hierarchic clustering algorithms for use in document retrieval

Information Processing and Management: an International Journal
Learning String-Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Grouper: a dynamic clustering interface to Web search results

WWW '99 Proceedings of the eighth international conference on World Wide Web
Average complexity of exact and approximate multiple string matching

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a No-Word-Segmentation Hierarchical Clustering Approach (NWSHCA) to Chinese Web search results. The approach uses a new similarity measure between two documents based on a variation of the Edit Distance, and then it generates preliminary clusters using a partitioning clustering method. Next it ranks all common substring in a cluster using a cluster-discriminative metric with the top K as cluster description labels. Finally it uses HAC to cluster the top K cluster labels to form a navigational tree. NWSHCA can generate overlapping clusters contrast to most clustering algorithms. Experimental results show that the approach is feasible and effective.