Incremental diversification for very large sets: a streaming-based approach

Authors:
Enrico Minack;Wolf Siberski;Wolfgang Nejdl
Affiliations:
Leibniz Universität Hannover, Hannover, Germany;Leibniz Universität Hannover, Hannover, Germany;Leibniz Universität Hannover, Hannover, Germany
Venue:
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Year:
2011

Citing 18
Cited 6

The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Continuous queries over data streams

ACM SIGMOD Record
Beyond independent relevance: methods and evaluation metrics for subtopic retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Improving recommendation lists through topic diversification

WWW '05 Proceedings of the 14th international conference on World Wide Web
Less is more: probabilistic models for retrieving fewer relevant documents

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient query subscription processing for prospective search engines

ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Novelty and diversity in information retrieval evaluation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Diversifying search results

Proceedings of the Second ACM International Conference on Web Search and Data Mining
It takes variety to make a world: diversification in recommender systems

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
An axiomatic approach for result diversification

Proceedings of the 18th international conference on World wide web
C-SPARQL: SPARQL for continuous querying

Proceedings of the 18th international conference on World wide web
Efficient Computation of Diverse Query Results

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Portfolio theory of information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Preference-aware publish/subscribe delivery with diversity

Proceedings of the Third ACM International Conference on Distributed Event-Based Systems
A risk minimization framework for information retrieval

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Exploiting query reformulations for web search result diversification

Proceedings of the 19th international conference on World wide web
A study of blog search

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Max-Sum diversification, monotone submodular functions and dynamic updates

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Dynamic diversification of continuous data

Proceedings of the 15th International Conference on Extending Database Technology
Search result diversification methods to assist lexicographers

LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop
Efficient jaccard-based diversity analysis of large document collections

Proceedings of the 21st ACM international conference on Information and knowledge management
DisC diversity: result diversification based on dissimilarity and coverage

Proceedings of the VLDB Endowment
Max-sum diversification on image ranking with non-uniform matroid constraints

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Result diversification is an effective method to reduce the risk that none of the returned results satisfies a user's query intention. It has been shown to decrease query abandonment substantially. On the other hand, computing an optimally diverse set is NP-hard for the usual objectives. Existing greedy diversification algorithms require random access to the input set, rendering them impractical in the context of large result sets or continuous data. To solve this issue, we present a novel diversification approach which treats the input as a stream and processes each element in an incremental fashion, maintaining a near-optimal diverse set at any point in the stream. Our approach exhibits a linear computation and constant memory complexity with respect to input size, without significant loss of diversification quality. In an extensive evaluation on several real-world data sets, we show the applicability and efficiency of our algorithm for large result sets as well as for continuous query scenarios such as news stream subscriptions.