Combining implicit and explicit topic representations for result diversification

Authors:
Jiyin He;Vera Hollink;Arjen de Vries
Affiliations:
Centrum Wiskunde en Informatica, Amsterdam, Netherlands;Centrum Wiskunde en Informatica, Amsterdam, Netherlands;Centrum Wiskunde en Informatica, Amsterdam, Netherlands
Venue:
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Year:
2012

Citing 31
Cited 3

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Agglomerative clustering of a search engine query log

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering user queries of a search engine

Proceedings of the 10th international conference on World Wide Web
Using part-of-speech patterns to reduce query ambiguity

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Beyond independent relevance: methods and evaluation metrics for subtopic retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Combining the language model and inference network approaches to retrieval

Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
Less is more: probabilistic models for retrieving fewer relevant documents

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Random walks on the click graph

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Using the wisdom of the crowds for keyword generation

Proceedings of the 17th international conference on World Wide Web
Predicting diverse subsets using structural SVMs

Proceedings of the 25th international conference on Machine learning
Novelty and diversity in information retrieval evaluation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Modeling hidden topics on document manifold

Proceedings of the 17th ACM conference on Information and knowledge management
Diversifying search results

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Portfolio theory of information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
Probabilistic models of ranking novel documents for faceted topic retrieval

Proceedings of the 18th ACM conference on Information and knowledge management
Query reformulation using anchor text

Proceedings of the third ACM international conference on Web search and data mining
Diversifying web search results

Proceedings of the 19th international conference on World wide web
Exploiting query reformulations for web search result diversification

Proceedings of the 19th international conference on World wide web
Inferring query intent from reformulations and clicks

Proceedings of the 19th international conference on World wide web
An overview of Microsoft web N-gram corpus and applications

HLT-DEMO '10 Proceedings of the NAACL HLT 2010 Demonstration Session
Multi-dimensional search result diversification

Proceedings of the fourth ACM international conference on Web search and data mining
Result diversification based on query-specific cluster ranking

Journal of the American Society for Information Science and Technology
Efficient and effective spam filtering and re-ranking for large web datasets

Information Retrieval
Multi-view random walk framework for search task discovery from click-through log

Proceedings of the 20th ACM international conference on Information and knowledge management
Intent-aware query similarity

Proceedings of the 20th ACM international conference on Information and knowledge management
Inferring query aspects from reformulations using clustering

Proceedings of the 20th ACM international conference on Information and knowledge management

Sentiment diversification with different biases

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Search result diversification in resource selection for federated search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Term level search result diversification

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Result diversification deals with ambiguous or multi-faceted queries by providing documents that cover as many subtopics of a query as possible. Various approaches to subtopic modeling have been proposed. Subtopics have been extracted internally, e.g., from retrieved documents, and externally, e.g., from Web resources such as query logs. Internally modeled subtopics are often implicitly represented, e.g., as latent topics, while externally modeled subtopics are often explicitly represented, e.g., as reformulated queries. We propose a framework that: i)combines both implicitly and explicitly represented subtopics; and ii)allows flexible combination of multiple external resources in a transparent and unified manner. Specifically, we use a random walk based approach to estimate the similarities of the explicit subtopics mined from a number of heterogeneous resources: click logs, anchor text, and web n-grams. We then use these similarities to regularize the latent topics extracted from the top-ranked documents, i.e., the internal (implicit) subtopics. Empirical results show that regularization with explicit subtopics extracted from the right resource leads to improved diversification results, indicating that the proposed regularization with (explicit) external resources forms better (implicit) topic models. Click logs and anchor text are shown to be more effective resources than web n-grams under current experimental settings. Combining resources does not always lead to better results, but achieves a robust performance. This robustness is important for two reasons: it cannot be predicted which resources will be most effective for a given query, and it is not yet known how to reliably determine the optimal model parameters for building implicit topic models.