A no-word-segmentation hierarchical clustering approach to Chinese web search results

  • Authors:
  • Hui Zhang;Liping Zhao;Rui Liu;Deqing Wang

  • Affiliations:
  • State Key Lab.Of Software Development Environment, Beihang University;State Key Lab.Of Software Development Environment, Beihang University;State Key Lab.Of Software Development Environment, Beihang University;State Key Lab.Of Software Development Environment, Beihang University

  • Venue:
  • AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a No-Word-Segmentation Hierarchical Clustering Approach (NWSHCA) to Chinese Web search results. The approach uses a new similarity measure between two documents based on a variation of the Edit Distance, and then it generates preliminary clusters using a partitioning clustering method. Next it ranks all common substring in a cluster using a cluster-discriminative metric with the top K as cluster description labels. Finally it uses HAC to cluster the top K cluster labels to form a navigational tree. NWSHCA can generate overlapping clusters contrast to most clustering algorithms. Experimental results show that the approach is feasible and effective.