Informative Polythetic Hierarchical Ephemeral Clustering

Authors:
Gaël Dias;Guillaume Cleuziou;David Machado
Affiliations:
-;-;-
Venue:
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Year:
2011

Citing 8
Cited 2

Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A clustering algorithm for asymmetrically related data with applications to text mining

Proceedings of the tenth international conference on Information and knowledge management
Generating hierarchical summaries for web searches

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A hierarchical monothetic document clustering algorithm for summarization and browsing search results

Proceedings of the 13th international conference on World Wide Web
Learning to cluster web search results

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A personalized search engine based on Web-snippet hierarchical clustering

Software—Practice & Experience
Universal Mobile Information Retrieval

UAHCI '09 Proceedings of the 5th International on ConferenceUniversal Access in Human-Computer Interaction. Part II: Intelligent and Ubiquitous Interaction Environments

Using ephemeral clustering and query logs to organize web image search results on mobile devices

IMMPD '11 Proceedings of the 2011 international ACM workshop on Interactive multimedia on mobile and portable devices
Using text-based web image search results clustering to minimize mobile devices wasted space-interface

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.01

Visualization

Abstract

Ephemeral clustering has been studied for more than a decade, although with low user acceptance. According to us, this situation is mainly due to (1) an excessive number of generated clusters, which makes browsing difficult and (2) low quality labeling, which introduces imprecision within the search process. In this paper, our motivation is twofold. First, we propose to reduce the number of clusters of Web page results, but keeping all different query meanings. For that purpose, we propose a new polythetic methodology based on an informative similarity measure, the InfoSimba, and a new hierarchical clustering algorithm, the HISGK-means. Second, a theoretical background is proposed to define meaningful cluster labels embedded in the definition of the HISGK-means algorithm, which may elect as best label, words outside the given cluster. To confirm our intuitions, we propose a new evaluation framework, which shows that we are able to extract most of the important query meanings but generating much less clusters than state-of-the-art systems.