Mining subtopics from different aspects for diversifying search results

Authors:
Chieh-Jen Wang;Yung-Wei Lin;Ming-Feng Tsai;Hsin-Hsi Chen
Affiliations:
Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan 10617;Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan 10617;Department of Computer Science and Program in Digital Content and Technologies, National Chengchi University, Taipei, Taiwan 11605;Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan 10617
Venue:
Information Retrieval
Year:
2013

Citing 29
Cited 0

Evaluation of an inference network-based retrieval model

ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A taxonomy of web search

ACM SIGIR Forum
Beyond independent relevance: methods and evaluation metrics for subtopic retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Understanding user goals in web search

Proceedings of the 13th international conference on World Wide Web
Query chains: learning to rank from implicit feedback

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Less is more: probabilistic models for retrieving fewer relevant documents

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Improving personalized web search using result diversification

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Identifying User Goals from Web Search Results

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Ambiguous requests: implications for retrieval tests, systems and theories

ACM SIGIR Forum
Predicting diverse subsets using structural SVMs

Proceedings of the 25th international conference on Machine learning
Query dependent ranking using K-nearest neighbor

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Novelty and diversity in information retrieval evaluation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
The query-flow graph: model and applications

Proceedings of the 17th ACM conference on Information and knowledge management
Diversifying search results

Proceedings of the Second ACM International Conference on Web Search and Data Mining
An axiomatic approach for result diversification

Proceedings of the 18th international conference on World wide web
Understanding user's query intent with wikipedia

Proceedings of the 18th international conference on World wide web
Efficient Computation of Diverse Query Results

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Portfolio theory of information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
An Analysis of NP-Completeness in Novelty and Diversity Ranking

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Probabilistic models of ranking novel documents for faceted topic retrieval

Proceedings of the 18th ACM conference on Information and knowledge management
Redundancy, diversity and interdependent document relevance

ACM SIGIR Forum
Semantic tagging of web search queries

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Diversifying web search results

Proceedings of the 19th international conference on World wide web
Exploiting query reformulations for web search result diversification

Proceedings of the 19th international conference on World wide web
Selectively diversifying web search results

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Search result diversity for informational queries

Proceedings of the 20th international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

User queries to the Web tend to have more than one interpretation due to their ambiguity and other characteristics. How to diversify the ranking results to meet users' various potential information needs has attracted considerable attention recently. This paper is aimed at mining the subtopics of a query either indirectly from the returned results of retrieval systems or directly from the query itself to diversify the search results. For the indirect subtopic mining approach, clustering the retrieval results and summarizing the content of clusters is investigated. In addition, labeling topic categories and concept tags on each returned document is explored. For the direct subtopic mining approach, several external resources, such as Wikipedia, Open Directory Project, search query logs, and the related search services of search engines, are consulted. Furthermore, we propose a diversified retrieval model to rank documents with respect to the mined subtopics for balancing relevance and diversity. Experiments are conducted on the ClueWeb09 dataset with the topics of the TREC09 and TREC10 Web Track diversity tasks. Experimental results show that the proposed subtopic-based diversification algorithm significantly outperforms the state-of-the-art models in the TREC09 and TREC10 Web Track diversity tasks. The best performance our proposed algorithm achieves is 驴-nDCG@5 0.307, IA-P@5 0.121, and 驴#-nDCG@5 0.214 on the TREC09, as well as 驴-nDCG@10 0.421, IA-P@10 0.201, and 驴#-nDCG@10 0.311 on the TREC10. The results conclude that the subtopic mining technique with the up-to-date users' search query logs is the most effective way to generate the subtopics of a query, and the proposed subtopic-based diversification algorithm can select the documents covering various subtopics.