Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Grouper: a dynamic clustering interface to Web search results
WWW '99 Proceedings of the eighth international conference on World Wide Web
Retrieving collocations from text: Xtract
Computational Linguistics - Special issue on using large corpora: I
Learning to cluster web search results
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A personalized search engine based on Web-snippet hierarchical clustering
Software—Practice & Experience
Cluster generation and cluster labelling for web snippets: a fast and accurate hierarchical solution
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Hi-index | 0.00 |
This study proposes a novel method to group and organize search results. We apply statistical techniques to term co-occurrence information in a corpus to retrieve bi-grams firstly, and then combine bi-grams into n-grams. After eliminating redundant n-grams, the remaining ones are ranked and selected as cluster labels. Base clusters are constructed according to these cluster labels and then agglomerated into higher-level clusters. We refer to the proposed algorithm as CoHC (Co-occurrence based Hierarchical Clustering). we compare CoHC with three other search results clustering (SRC) algorithms: Suffix Tree Clustering (STC), Lingo, and Vivisimo. We also analyze the properties of cluster labels produced by different SRC algorithms. The experimental results show that our method outperforms the other three SRC algorithms, and is helpful to the user for browsing and locating the results of interest.