Diversifying top-k results

Authors:
Lu Qin;Jeffrey Xu Yu;Lijun Chang
Affiliations:
The Chinese University of Hong Kong, Hong Kong, China;The Chinese University of Hong Kong, Hong Kong, China;The Chinese University of Hong Kong, Hong Kong, China
Venue:
Proceedings of the VLDB Endowment
Year:
2012

Citing 19
Cited 7

The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Combining fuzzy information from multiple systems

Journal of Computer and System Sciences
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Novelty and redundancy detection in adaptive filtering

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
A new algorithm for the maximum-weight clique problem

Nordic Journal of Computing
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Towards Efficient Multi-Feature Queries in Heterogeneous Environments

ITCC '01 Proceedings of the International Conference on Information Technology: Coding and Computing
Finding and approximating top-k answers in keyword proximity search

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient top-k aggregation of ranked inputs

ACM Transactions on Database Systems (TODS)
Keyword proximity search in complex data graphs

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
Diversifying search results

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Querying Communities in Relational Databases

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
DivQ: diversification for keyword search over structured databases

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
DivRank: the interplay of prestige and diversity in information networks

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
A unified framework for recommending diverse and relevant queries

Proceedings of the 20th international conference on World wide web
Efficient diversity-aware search

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Keyword search in graphs: finding r-cliques

Proceedings of the VLDB Endowment

Top-k diversity queries over bounded regions

ACM Transactions on Database Systems (TODS)
Diversity maximization under matroid constraints

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Generating informative snippet to maximize item visibility

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Diversified top-k graph pattern matching

Proceedings of the VLDB Endowment
Top-K structural diversity search in large networks

Proceedings of the VLDB Endowment
POIKILO: a tool for evaluating the results of diversification models and algorithms

Proceedings of the VLDB Endowment
As-Soon-As-Possible top-k query processing in p2p systems

Transactions on Large-Scale Data- and Knowledge-centered systems IX

Quantified Score

Hi-index	0.00

Visualization

Abstract

Top-k query processing finds a list of k results that have largest scores w.r.t the user given query, with the assumption that all the k results are independent to each other. In practice, some of the top-k results returned can be very similar to each other. As a result some of the top-k results returned are redundant. In the literature, diversified top-k search has been studied to return k results that take both score and diversity into consideration. Most existing solutions on diversified top-k search assume that scores of all the search results are given, and some works solve the diversity problem on a specific problem and can hardly be extended to general cases. In this paper, we study the diversified top-k search problem. We define a general diversified top-k search problem that only considers the similarity of the search results themselves. We propose a framework, such that most existing solutions for top-k query processing can be extended easily to handle diversified top-k search, by simply applying three new functions, a sufficient stop condition sufficient(), a necessary stop condition necessary(), and an algorithm for diversified top-k search on the current set of generated results, div-search-current(). We propose three new algorithms, namely, div-astar, div-dp, and div-cut to solve the div-search-current() problem. div-astar is an A* based algorithm, div-dp is an algorithm that decomposes the results into components which are searched using div-astar independently and combined using dynamic programming. div-cut further decomposes the current set of generated results using cut points and combines the results using sophisticated operations. We conducted extensive performance studies using two real datasets, enwiki and reuters. Our div-cut algorithm finds the optimal solution for diversified top-k search problem in seconds even for k as large as 2, 000.