Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A clustering algorithm for asymmetrically related data with applications to text mining
Proceedings of the tenth international conference on Information and knowledge management
Generating hierarchical summaries for web searches
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Proceedings of the 13th international conference on World Wide Web
Learning to cluster web search results
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A personalized search engine based on Web-snippet hierarchical clustering
Software—Practice & Experience
Universal Mobile Information Retrieval
UAHCI '09 Proceedings of the 5th International on ConferenceUniversal Access in Human-Computer Interaction. Part II: Intelligent and Ubiquitous Interaction Environments
Using ephemeral clustering and query logs to organize web image search results on mobile devices
IMMPD '11 Proceedings of the 2011 international ACM workshop on Interactive multimedia on mobile and portable devices
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Hi-index | 0.01 |
Ephemeral clustering has been studied for more than a decade, although with low user acceptance. According to us, this situation is mainly due to (1) an excessive number of generated clusters, which makes browsing difficult and (2) low quality labeling, which introduces imprecision within the search process. In this paper, our motivation is twofold. First, we propose to reduce the number of clusters of Web page results, but keeping all different query meanings. For that purpose, we propose a new polythetic methodology based on an informative similarity measure, the InfoSimba, and a new hierarchical clustering algorithm, the HISGK-means. Second, a theoretical background is proposed to define meaningful cluster labels embedded in the definition of the HISGK-means algorithm, which may elect as best label, words outside the given cluster. To confirm our intuitions, we propose a new evaluation framework, which shows that we are able to extract most of the important query meanings but generating much less clusters than state-of-the-art systems.