Topical clustering of search results

Authors:
Ugo Scaiella;Paolo Ferragina;Andrea Marino;Massimiliano Ciaramita
Affiliations:
University of Pisa, Pisa, Italy;University of Pisa, Pisa, Italy;University of Florence, Florence, Italy;Google Research, Zürich, Switzerland
Venue:
Proceedings of the fifth ACM international conference on Web search and data mining
Year:
2012

Citing 20
Cited 11

A personalized search engine based on web-snippet hierarchical clustering

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
A Concept-Driven Algorithm for Clustering Search Results

IEEE Intelligent Systems
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
AI Gets a Brain

Queue - AI
Learn from web search logs to organize search results

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Clustering short texts using wikipedia

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A tutorial on spectral clustering

Statistics and Computing
Enhancing text clustering by leveraging Wikipedia semantics

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Spectral geometry for simultaneously clustering and ranking query search results

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Topical query decomposition

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to link with wikipedia

Proceedings of the 17th ACM conference on Information and knowledge management
Clustering Documents Using a Wikipedia-Based Concept Representation

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
A survey of Web clustering engines

ACM Computing Surveys (CSUR)
Collective annotation of Wikipedia entities in web text

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Wikipedia-based semantic interpretation for natural language processing

Journal of Artificial Intelligence Research
Exploiting internal and external semantics for the clustering of short texts using world knowledge

Proceedings of the 18th ACM conference on Information and knowledge management
Optimal meta search results clustering

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Robust disambiguation of named entities in text

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Topic-driven reader comments summarization

Proceedings of the 21st ACM international conference on Information and knowledge management
Increasing stability of result organization for session search

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Using text-based web image search results clustering to minimize mobile devices wasted space-interface

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Improved text annotation with Wikipedia entities

Proceedings of the 28th Annual ACM Symposium on Applied Computing
InfoLand: information lay-of-land for session search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Search result presentation: supporting post-search navigation by integration of taxonomy data

Proceedings of the 22nd international conference on World Wide Web companion
A framework for benchmarking entity-annotation systems

Proceedings of the 22nd international conference on World Wide Web
Navigating the topical structure of academic search results via the Wikipedia category network

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Exploiting DBpedia for web search results clustering

Proceedings of the 2013 workshop on Automated knowledge base construction
Knowledge-based graph document modeling

Proceedings of the 7th ACM international conference on Web search and data mining
Acquisition of open-domain classes via intersective semantics

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Search results clustering (SRC) is a challenging algorithmic problem that requires grouping together the results returned by one or more search engines in topically coherent clusters, and labeling the clusters with meaningful phrases describing the topics of the results included in them. In this paper we propose to solve SRC via an innovative approach that consists of modeling the problem as the labeled clustering of the nodes of a newly introduced graph of topics. The topics are Wikipedia-pages identified by means of recently proposed topic annotators [9, 11, 16, 20] applied to the search results, and the edges denote the relatedness among these topics computed by taking into account the linkage of the Wikipedia-graph. We tackle this problem by designing a novel algorithm that exploits the spectral properties and the labels of that graph of topics. We show the superiority of our approach with respect to academic state-of-the-art work [6] and well-known commercial systems (CLUSTY and LINGO3G) by performing an extensive set of experiments on standard datasets and user studies via Amazon Mechanical Turk. We test several standard measures for evaluating the performance of all systems and show a relative improvement of up to 20%.