The use of MMR, diversity-based reranking for reordering documents and producing summaries
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Combining fuzzy information from multiple systems
Journal of Computer and System Sciences
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Novelty and redundancy detection in adaptive filtering
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
A new algorithm for the maximum-weight clique problem
Nordic Journal of Computing
Optimal aggregation algorithms for middleware
Journal of Computer and System Sciences - Special issu on PODS 2001
Towards Efficient Multi-Feature Queries in Heterogeneous Environments
ITCC '01 Proceedings of the International Conference on Information Technology: Coding and Computing
Finding and approximating top-k answers in keyword proximity search
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient top-k aggregation of ranked inputs
ACM Transactions on Database Systems (TODS)
Keyword proximity search in complex data graphs
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A survey of top-k query processing techniques in relational database systems
ACM Computing Surveys (CSUR)
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Querying Communities in Relational Databases
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
DivQ: diversification for keyword search over structured databases
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
DivRank: the interplay of prestige and diversity in information networks
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
A unified framework for recommending diverse and relevant queries
Proceedings of the 20th international conference on World wide web
Efficient diversity-aware search
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Keyword search in graphs: finding r-cliques
Proceedings of the VLDB Endowment
Top-k diversity queries over bounded regions
ACM Transactions on Database Systems (TODS)
Diversity maximization under matroid constraints
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Generating informative snippet to maximize item visibility
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Diversified top-k graph pattern matching
Proceedings of the VLDB Endowment
Top-K structural diversity search in large networks
Proceedings of the VLDB Endowment
POIKILO: a tool for evaluating the results of diversification models and algorithms
Proceedings of the VLDB Endowment
As-Soon-As-Possible top-k query processing in p2p systems
Transactions on Large-Scale Data- and Knowledge-centered systems IX
Hi-index | 0.00 |
Top-k query processing finds a list of k results that have largest scores w.r.t the user given query, with the assumption that all the k results are independent to each other. In practice, some of the top-k results returned can be very similar to each other. As a result some of the top-k results returned are redundant. In the literature, diversified top-k search has been studied to return k results that take both score and diversity into consideration. Most existing solutions on diversified top-k search assume that scores of all the search results are given, and some works solve the diversity problem on a specific problem and can hardly be extended to general cases. In this paper, we study the diversified top-k search problem. We define a general diversified top-k search problem that only considers the similarity of the search results themselves. We propose a framework, such that most existing solutions for top-k query processing can be extended easily to handle diversified top-k search, by simply applying three new functions, a sufficient stop condition sufficient(), a necessary stop condition necessary(), and an algorithm for diversified top-k search on the current set of generated results, div-search-current(). We propose three new algorithms, namely, div-astar, div-dp, and div-cut to solve the div-search-current() problem. div-astar is an A* based algorithm, div-dp is an algorithm that decomposes the results into components which are searched using div-astar independently and combined using dynamic programming. div-cut further decomposes the current set of generated results using cut points and combines the results using sophisticated operations. We conducted extensive performance studies using two real datasets, enwiki and reuters. Our div-cut algorithm finds the optimal solution for diversified top-k search problem in seconds even for k as large as 2, 000.