Mining subtopics from text fragments for a web query

Authors:
Qinglei Wang;Yanan Qian;Ruihua Song;Zhicheng Dou;Fan Zhang;Tetsuya Sakai;Qinghua Zheng
Affiliations:
SPKLSTN Lab, Department of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, People's Republic of China 710049;SPKLSTN Lab, Department of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, People's Republic of China 710049;Microsoft Research Asia, Beijing, People's Republic of China 100080;Microsoft Research Asia, Beijing, People's Republic of China 100080;Nankai-Baidu Joint Lab, Nankai University, Tianjin, People's Republic of China 300071;Microsoft Research Asia, Beijing, People's Republic of China 100080;SPKLSTN Lab, Department of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, People's Republic of China 710049
Venue:
Information Retrieval
Year:
2013

Citing 29
Cited 1

Constant interaction-time scatter/gather browsing of very large document collections

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Grouper: a dynamic clustering interface to Web search results

WWW '99 Proceedings of the eighth international conference on World Wide Web
Bringing order to the Web: automatically categorizing search results

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Agglomerative clustering of a search engine query log

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Beyond independent relevance: methods and evaluation metrics for subtopic retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Learning to cluster web search results

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A Concept-Driven Algorithm for Clustering Search Results

IEEE Intelligent Systems
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
A scalable algorithm for high-quality clustering of web snippets

Proceedings of the 2006 ACM symposium on Applied computing
Web searching on the Vivisimo search engine

Journal of the American Society for Information Science and Technology
Learn from web search logs to organize search results

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An Efficient Technique for Mining Approximately Frequent Substring Patterns

ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
A personalized search engine based on Web-snippet hierarchical clustering

Software—Practice & Experience
Learning query intent from regularized click graphs

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Diversifying search results

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Intentional query suggestion: making user goals more explicit during search

Proceedings of the 2009 workshop on Web Search Click Data
An axiomatic approach for result diversification

Proceedings of the 18th international conference on World wide web
Multiple approaches to analysing query diversity

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Diversifying web search results

Proceedings of the 19th international conference on World wide web
Exploiting query reformulations for web search result diversification

Proceedings of the 19th international conference on World wide web
Inferring query intent from reformulations and clicks

Proceedings of the 19th international conference on World wide web
Diversification of search results using webgraphs

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Selectively diversifying web search results

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A comparative analysis of cascade measures for novelty and diversity

Proceedings of the fourth ACM international conference on Web search and data mining
An exploration of pattern-based subtopic modeling for search result diversification

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Evaluating diversified search results using per-intent graded relevance

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Improving search relevance for short queries in community question answering

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web search queries are often ambiguous or faceted, and the task of identifying the major underlying senses and facets of queries has received much attention in recent years. We refer to this task as query subtopic mining. In this paper, we propose to use surrounding text of query terms in top retrieved documents to mine subtopics and rank them. We first extract text fragments containing query terms from different parts of documents. Then we group similar text fragments into clusters and generate a readable subtopic for each cluster. Based on the cluster and the language model trained from a query log, we calculate three features and combine them into a relevance score for each subtopic. Subtopics are finally ranked by balancing relevance and novelty. Our evaluation experiments with the NTCIR-9 INTENT Chinese Subtopic Mining test collection show that our method significantly outperforms a query log based method proposed by Radlinski et al. (2010) and a search result clustering based method proposed by Zeng et al. (2004) in terms of precision, I-rec, D-nDCG and D#-nDCG, the official evaluation metrics used at the NTCIR-9 INTENT task. Moreover, our generated subtopics are significantly more readable than those generated by the search result clustering method.