Modelling efficient novelty-based search result diversification in metric spaces

Authors:
Veronica Gil-Costa;Rodrygo L. T. Santos;Craig Macdonald;Iadh Ounis
Affiliations:
Yahoo! Research, Santiago de Chile, Chile and CONICET, Argentina;University of Glasgow, UK;University of Glasgow, UK;University of Glasgow, UK
Venue:
Journal of Discrete Algorithms
Year:
2013

Citing 50
Cited 0

An algorithm for finding nearest neighbours in (approximately) constant average time

Pattern Recognition Letters
Voronoi diagrams—a survey of a fundamental geometric data structure

ACM Computing Surveys (CSUR)
A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements

Pattern Recognition Letters
A fast branch & bound nearest neighbour classifier in metric spaces

Pattern Recognition Letters
Distance-based indexing for high-dimensional metric spaces

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A Simple Algorithm for Nearest Neighbor Search in High Dimensions

IEEE Transactions on Pattern Analysis and Machine Intelligence
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Data structures and algorithms for nearest neighbor search in general metric spaces

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Locally lifting the curse of dimensionality for nearest neighbor search (extended abstract)

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Some approaches to best-match file searching

Communications of the ACM
Distributed Processing of Similarity Queries

Distributed and Parallel Databases
Searching in metric spaces

ACM Computing Surveys (CSUR)
Fixed Queries Array: A Fast and Economical Data Structure for Proximity Searching

Multimedia Tools and Applications
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Fully Dynamic Spatial Approximation Trees

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Proximity Matching Using Fixed-Queries Trees

CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
Searching in metric spaces by spatial approximation

The VLDB Journal — The International Journal on Very Large Data Bases
Spaghettis: An Array Based Algorithm for Similarity Queries in Metric Spaces

SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Beyond independent relevance: methods and evaluation metrics for subtopic retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)

Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
A compact space decomposition for effective metric indexing

Pattern Recognition Letters
Similarity Search: The Metric Space Approach (Advances in Database Systems)

Similarity Search: The Metric Space Approach (Advances in Database Systems)
Similarity Search Using Sparse Pivots for Efficient Multimedia Information Retrieval

ISM '06 Proceedings of the Eighth IEEE International Symposium on Multimedia
Novelty and diversity in information retrieval evaluation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Scalability comparison of Peer-to-Peer similarity search structures

Future Generation Computer Systems
Efficient search in file-sharing networks

ICPADS '07 Proceedings of the 13th International Conference on Parallel and Distributed Systems - Volume 01
Hybrid Index for Metric Space Databases

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Effective Proximity Retrieval by Ordering Permutations

IEEE Transactions on Pattern Analysis and Machine Intelligence
Lire: lucene image retrieval: an extensible java CBIR library

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Approximate similarity search in metric spaces using inverted files

Proceedings of the 3rd international conference on Scalable information systems
Diversifying search results

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Parallel query processing on distributed clustering indexes

Journal of Discrete Algorithms
Counting distance permutations

Journal of Discrete Algorithms
Visual diversification of image search results

Proceedings of the 18th international conference on World wide web
A repartitioning hypergraph model for dynamic load balancing

Journal of Parallel and Distributed Computing
Portfolio theory of information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Metric Index: An Efficient and Scalable Solution for Similarity Search

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Dynamic Spatial Approximation Trees for Massive Data

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Dynamic P2P Indexing and Search Based on Compact Clustering

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
MiPai: Using the PP-Index to Build an Efficient and Scalable Similarity Search System

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Text-Based and Content-Based Image Retrieval on Flickr: DEMO

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
Probabilistic models of ranking novel documents for faceted topic retrieval

Proceedings of the 18th ACM conference on Information and knowledge management
Exploiting query reformulations for web search result diversification

Proceedings of the 19th international conference on World wide web
An approach to content-based image retrieval based on the Lucene search engine library

ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Efficient diversification of web search results

Proceedings of the VLDB Endowment
Sparse spatial selection for novelty-based search result diversification

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Explicit search result diversification through sub-queries

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Large-scale similarity data management with distributed Metric Index

Information Processing and Management: an International Journal
On the role of novelty for search result diversification

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Novelty-based diversification provides a way to tackle ambiguous queries by re-ranking a set of retrieved documents. Current approaches are typically greedy, requiring O(n^2) document-document comparisons in order to diversify a ranking of n documents. In this article, we introduce a new approach for novelty-based search result diversification to reduce the overhead incurred by document-document comparisons. To this end, we model novelty promotion as a similarity search in a metric space, exploiting the properties of this space to efficiently identify novel documents. We investigate three different approaches: pivoting-based, clustering-based, and permutation-based. In the first two, a novel document is one that lies outside the range of a pivot or outside a cluster. In the latter, a novel document is one that has a different signature (i.e., the document@?s relative distance to a distinguished set of fixed objects called permutants) compared to previously selected documents. Thorough experiments using two TREC test collections for diversity evaluation, as well as a large sample of the query stream of a commercial search engine show that our approaches perform at least as effectively as well-known novelty-based diversification approaches in the literature, while dramatically improving their efficiency.