An exploration of pattern-based subtopic modeling for search result diversification

Authors:
Wei Zheng;Xuanhui Wang;Hui Fang;Hong Cheng
Affiliations:
University of Delaware, Newark, DE, USA;Yahoo!, Santa Clara, CA, USA;University of Delaware, Newark, DE, USA;Chinese University of Hong Kong, Hong Kong, China
Venue:
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Year:
2011

Citing 7
Cited 4

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Semantic term matching in axiomatic approaches to information retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic models of ranking novel documents for faceted topic retrieval

Proceedings of the 18th ACM conference on Information and knowledge management
Exploiting query reformulations for web search result diversification

Proceedings of the 19th international conference on World wide web
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Search result diversification for enterprise data

Proceedings of the 20th ACM international conference on Information and knowledge management
Coverage-based search result diversification

Information Retrieval
Mining subtopics from text fragments for a web query

Information Retrieval
Leveraging integrated information to extract query subtopics for search result diversification

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional information retrieval models do not necessarily provide users with optimal search experience because the top ranked documents may contain the same piece of relevant information, i.e., the same subtopic of a query. The goal of search result diversification is to return search results that not only are relevant to the query but also cover different subtopics. Therefore, the subtopic modeling is an important research topic in search result diversification. In this paper, we propose a novel pattern based method to extract subtopics from retrieved documents. The basic idea is to explicitly model a query subtopic as a semantically meaningful text unit in relevant documents. We apply a frequent pattern mining algorithm to efficiently extract these text units (patterns) from retrieved documents. We then model a query subtopic with a single pattern and rank subtopics based on their similarity with the query. These pattern based subtopics are then used to diversify search results.