Dynamic optimization of queries in pivot-based indexing

Authors:
Svein Erik Bratsberg;Magnus Lie Hetland
Affiliations:
Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway 7491;Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway 7491
Venue:
Multimedia Tools and Applications
Year:
2012

Citing 20
Cited 1

An algorithm for finding nearest neighbours in (approximately) constant average time

Pattern Recognition Letters
A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements

Pattern Recognition Letters
A cost model for similarity queries in metric spaces

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Searching in metric spaces

ACM Computing Surveys (CSUR)
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Accurate estimation of the number of tuples satisfying a condition

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Fast Indexing and Visualization of Metric Data Sets using Slim-Trees

IEEE Transactions on Knowledge and Data Engineering
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
MB+Tree: A Dynamically Updatable Metric Index for Similarity Searches

WAIM '00 Proceedings of the First International Conference on Web-Age Information Management
Spaghettis: An Array Based Algorithm for Similarity Queries in Metric Spaces

SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Pivot selection techniques for proximity searching in metric spaces

Pattern Recognition Letters
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
R-Trees: Theory and Applications (Advanced Information and Knowledge Processing)

R-Trees: Theory and Applications (Advanced Information and Knowledge Processing)
Engineering efficient metric indexes

Pattern Recognition Letters
An effective cost model for similarity queries in metric spaces

Proceedings of the 2007 ACM symposium on Applied computing
The Omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient

The VLDB Journal — The International Journal on Very Large Data Bases
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A Dynamic Pivot Selection Technique for Similarity Search

SISAP '08 Proceedings of the First International Workshop on Similarity Search and Applications (sisap 2008)
Spatial Selection of Sparse Pivots for Similarity Search in Metric Spaces

SOFSEM '07 Proceedings of the 33rd conference on Current Trends in Theory and Practice of Computer Science
On the least cost for proximity searching in metric spaces

WEA'06 Proceedings of the 5th international conference on Experimental Algorithms

Ptolemaic indexing of the signature quadratic form distance

Proceedings of the Fourth International Conference on SImilarity Search and APplications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper evaluates the use of standard database indexes and query processing as a way to do metric indexing in the LAESA approach. By utilizing B-trees and R-trees as pivot-based indexes, we may use well-known optimization techniques from the database field within metric indexing and search. The novelty of this paper is that we use a cost-based approach to dynamically evaluate which and how many pivots to use in the evaluation of each query. By a series of measurements using our database prototype we are able to evaluate the performance of this approach. Compared to using all available pivots for filtering, the optimized approach gives half the response times for main memory data, but much more varied results for disk resident data. However, by use of the cost model we are able to dynamically determine when to bypass the indexes and simply perform a sequential scan of the base data. The conclusion of this evaluation is that it is beneficial to create many pivots, but to use only the most selective ones during evaluation of each query. R-trees give better performance than B-trees when utilizing all pivots, but when being able to dynamically select the best pivots, B-trees often provide better performance.