Parameter-free and domain-independent similarity search with diversity

Authors:
Lucio F. D. Santos;Willian D. Oliveira;Monica R. P. Ferreira;Agma J. M. Traina;Caetano Traina, Jr.
Affiliations:
University of Sao Paulo - Sao Carlos-SP, Brazil;University of Sao Paulo - Sao Carlos-SP, Brazil;University of Sao Paulo - Sao Carlos-SP, Brazil;University of Sao Paulo - Sao Carlos-SP, Brazil;University of Sao Paulo - Sao Carlos-SP, Brazil
Venue:
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Year:
2013

Citing 17
Cited 0

The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
Improving recommendation lists through topic diversification

WWW '05 Proceedings of the 14th international conference on World Wide Web
Range Nearest-Neighbor Query

IEEE Transactions on Knowledge and Data Engineering
Similarity Search Using Sparse Pivots for Efficient Multimedia Information Retrieval

ISM '06 Proceedings of the Eighth IEEE International Symposium on Multimedia
Addressing diverse user preferences in SQL-query-result navigation

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Diversifying search results

Proceedings of the Second ACM International Conference on Web Search and Data Mining
It takes variety to make a world: diversification in recommender systems

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Visual diversification of image search results

Proceedings of the 18th international conference on World wide web
An axiomatic approach for result diversification

Proceedings of the 18th international conference on World wide web
Efficient Computation of Diverse Query Results

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Distinct nearest neighbors queries for similarity search in very large multimedia databases

Proceedings of the eleventh international workshop on Web information and data management
Efficient diversification of web search results

Proceedings of the VLDB Endowment
Efficient diversity-aware search

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
On query result diversification

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Sparse spatial selection for novelty-based search result diversification

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
DisC diversity: result diversification based on dissimilarity and coverage

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

New operators to execute similarity-based queries over multimedia data stored in Database Management Systems are increasingly demanded. However, searching in very large datasets, the basic operators often return elements too much similar both to the query center and to themselves, reducing the answer's utility. In this paper, we tackle the problem of providing diversity to similarity query results, and define techniques to assure that each element in the result set is different enough from the others. Existing techniques compel the user to define either a parameter to trade among similarity and diversity or a minimum similarity between result elements. Distinctly, our approach provides similarity queries with diversification using the influence concept, which automatically estimates the inherent diversity between the result set elements requiring no user-defined parameters. Furthermore, our technique can be applied over any data represented in a metric space, so it is both parameter and application-domain independent. The "Better Results with Influence Diversification" (BRID) technique is the basis to the k-Diverse Nearest Neighbor (BRIDk) and to the Range Diverse (BRIDr) algorithms, which execute k-nearest neighbor and range queries with diversification, showing that the technique can be applied to diversify any type of similarity queries. We also define a way to measure the diversification degree in a result set. Through a detailed experimental evaluation using our approach, we show that BRID outperforms the existing methods regarding both query diversification quality and execution times, being at least two orders of magnitude faster than the best existing approaches.