Indexing inexact proximity search with distance regression in pivot space

  • Authors:
  • Ole Edsberg;Magnus Lie Hetland

  • Affiliations:
  • Norwegian University of Science and Technology, Trondheim, Norway;Norwegian University of Science and Technology, Trondheim, Norway

  • Venue:
  • Proceedings of the Third International Conference on SImilarity Search and APplications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce an inexact indexing scheme where, at index building time, training queries drawn from the database are used to fit one linear regression model for each object to be indexed. The response variable is the distance from the object to the query. The predictor variables are the distances from the query to each of a set of pivot objects. At search time, the models can provide distance estimates or probabilities of inclusion in the correct result, either of which can be used to rank the objects for an inexact search where the true distances are calculated in the resulting order, up to a halting point. To reduce storage requirements, the coefficients can be discretized at the cost of some precision in the promise values. We evaluate our scheme on synthetic and real-world data and compare it to a permutation-based scheme that has been reported to outperform other methods in the same experimental setting. We find that, in several of our experiments, the regression-based distance estimates give better query performance than the permutation-based promise values, in some cases even when the pivot set for the regression-based scheme is reduced in order to make its memory size equal to that of the permutation-based index. Limitations of our scheme include high index building cost and vulnerability to deviation from the model assumptions.