Beyond precision@10: clustering the long tail of web search results

Authors:
Benno Stein;Tim Gollub;Dennis Hoppe
Affiliations:
Bauhaus-Universität, Weimar, Germany;Bauhaus-Universität, Weimar, Germany;Bauhaus-Universität, Weimar, Germany
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 21
Cited 1

Grouper: a dynamic clustering interface to Web search results

WWW '99 Proceedings of the eighth international conference on World Wide Web
Using Noun Phrase Heads to Extract Document Keyphrases

AI '00 Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Beyond independent relevance: methods and evaluation metrics for subtopic retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A hierarchical monothetic document clustering algorithm for summarization and browsing search results

Proceedings of the 13th international conference on World Wide Web
A personalized search engine based on web-snippet hierarchical clustering

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
A clustering method for news articles retrieval system

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Clustering versus faceted categories for information exploration

Communications of the ACM - Supporting exploratory search
Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs

Proceedings of the 17th ACM conference on Information and knowledge management
Diversifying search results

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
A survey of Web clustering engines

ACM Computing Surveys (CSUR)
Essential Pages

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Full-Subtopic Retrieval with Keyphrase-Based Search Results Clustering

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Prototype hierarchy based clustering for the categorization and navigation of web collections

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Optimal meta search results clustering

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Faceted Search

Faceted Search
Inducing word senses to improve web search result clustering

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Web search solved?: all result rankings the same?

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Comprehensible and accurate cluster labels in text clustering

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Search result diversity for informational queries

Proceedings of the 20th international conference on World wide web
Cluster generation and cluster labelling for web snippets: a fast and accurate hierarchical solution

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval

Search result presentation based on faceted clustering

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper addresses the missing user acceptance of web search result clustering. We report on selected analyses and propose new concepts to improve existing result clustering approaches. Our findings in a nutshell are: 1. Don't compete with a search engine's top hits. In response to a query we presume search engines to return an optimal result list in the sense of the probabilistic ranking principle: documents that are expected by the majority of users are placed on top and form the result list head. We argue that, with respect to the top results, it is not beneficial to replace this established form of result presentation. 2. Improve document access in the result list tail. Documents that address the information need of "minorities" appear at some position in the result list tail. Especially for ambiguous and multi-faceted queries we expect this tail to be long, with many users appreciating different documents. In this situation web search result clustering can improve user satisfaction by reorganizing the long tail into topic-specific clusters. 3. Avoid shadowing when constructing cluster labels. We show that most of the cluster labels that are generated by current clustering technology occur within the snippets of the result list head--an effect which we call shadowing. The value of such labels for topic organization and navigating within a clustering of the entire result list is limited. We propose and analyze a filtering approach to significantly alleviate the label shadowing effect.