Indexing inexact proximity search with distance regression in pivot space

Authors:
Ole Edsberg;Magnus Lie Hetland
Affiliations:
Norwegian University of Science and Technology, Trondheim, Norway;Norwegian University of Science and Technology, Trondheim, Norway
Venue:
Proceedings of the Third International Conference on SImilarity Search and APplications
Year:
2010

Citing 10
Cited 1

A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements

Pattern Recognition Letters
Searching in metric spaces

ACM Computing Surveys (CSUR)
Computation of Normalized Edit Distance and Applications

IEEE Transactions on Pattern Analysis and Machine Intelligence
Probabilistic proximity search: fighting the curse of dimensionality in metric spaces

Information Processing Letters
A Probabilistic Spell for the Curse of Dimensionality

ALENEX '01 Revised Papers from the Third International Workshop on Algorithm Engineering and Experimentation
Index-driven similarity search in metric spaces (Survey Article)

ACM Transactions on Database Systems (TODS)
BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
Effective Proximity Retrieval by Ordering Permutations

IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate similarity search: A multi-faceted problem

Journal of Discrete Algorithms
On nonmetric similarity search problems in complex domains

ACM Computing Surveys (CSUR)

Versatile probability-based indexing for approximate similarity search

Proceedings of the Fourth International Conference on SImilarity Search and APplications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce an inexact indexing scheme where, at index building time, training queries drawn from the database are used to fit one linear regression model for each object to be indexed. The response variable is the distance from the object to the query. The predictor variables are the distances from the query to each of a set of pivot objects. At search time, the models can provide distance estimates or probabilities of inclusion in the correct result, either of which can be used to rank the objects for an inexact search where the true distances are calculated in the resulting order, up to a halting point. To reduce storage requirements, the coefficients can be discretized at the cost of some precision in the promise values. We evaluate our scheme on synthetic and real-world data and compare it to a permutation-based scheme that has been reported to outperform other methods in the same experimental setting. We find that, in several of our experiments, the regression-based distance estimates give better query performance than the permutation-based promise values, in some cases even when the pivot set for the regression-based scheme is reduced in order to make its memory size equal to that of the permutation-based index. Limitations of our scheme include high index building cost and vulnerability to deviation from the model assumptions.