ACM Computing Surveys (CSUR)
Computation of Normalized Edit Distance and Applications
IEEE Transactions on Pattern Analysis and Machine Intelligence
Probabilistic proximity search: fighting the curse of dimensionality in metric spaces
Information Processing Letters
A Probabilistic Spell for the Curse of Dimensionality
ALENEX '01 Revised Papers from the Third International Workshop on Algorithm Engineering and Experimentation
Index-driven similarity search in metric spaces (Survey Article)
ACM Transactions on Database Systems (TODS)
BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval
IEEE Transactions on Pattern Analysis and Machine Intelligence
Effective Proximity Retrieval by Ordering Permutations
IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate similarity search: A multi-faceted problem
Journal of Discrete Algorithms
On nonmetric similarity search problems in complex domains
ACM Computing Surveys (CSUR)
Versatile probability-based indexing for approximate similarity search
Proceedings of the Fourth International Conference on SImilarity Search and APplications
Hi-index | 0.00 |
We introduce an inexact indexing scheme where, at index building time, training queries drawn from the database are used to fit one linear regression model for each object to be indexed. The response variable is the distance from the object to the query. The predictor variables are the distances from the query to each of a set of pivot objects. At search time, the models can provide distance estimates or probabilities of inclusion in the correct result, either of which can be used to rank the objects for an inexact search where the true distances are calculated in the resulting order, up to a halting point. To reduce storage requirements, the coefficients can be discretized at the cost of some precision in the promise values. We evaluate our scheme on synthetic and real-world data and compare it to a permutation-based scheme that has been reported to outperform other methods in the same experimental setting. We find that, in several of our experiments, the regression-based distance estimates give better query performance than the permutation-based promise values, in some cases even when the pivot set for the regression-based scheme is reduced in order to make its memory size equal to that of the permutation-based index. Limitations of our scheme include high index building cost and vulnerability to deviation from the model assumptions.