Efficient online top-K retrieval with arbitrary similarity measures

Authors:
Prasad M Deshpande;Deepak P;Krishna Kummamuru
Affiliations:
IBM India Research Lab, Bangalore, India;IBM India Research Lab, Bangalore, India;IBM India Research Lab, Bangalore, India
Venue:
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Year:
2008

Citing 15
Cited 8

New formulation and improvements of the nearest-neighbour approximating and eliminating search algorithm (AESA)

Pattern Recognition Letters
Combining fuzzy information from multiple systems (extended abstract)

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Minimal probing: supporting expensive predicates for top-k queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Combining fuzzy information: an overview

ACM SIGMOD Record
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
D-Index: Distance Searching Index for Metric Data Sets

Multimedia Tools and Applications
Evaluating top-k queries over web-accessible databases

ACM Transactions on Database Systems (TODS)
On Learning Asymmetric Dissimilarity Measures

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Similarity Search: The Metric Space Approach (Advances in Database Systems)

Similarity Search: The Metric Space Approach (Advances in Database Systems)
Efficient Aggregation of Ranked Inputs

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
IO-Top-k: index-access optimized top-k query processing

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Progressive and selective merge: computing top-k with ad-hoc ranking functions

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
A Data Structure and an Algorithm for the Nearest Point Problem

IEEE Transactions on Software Engineering
Top-k query evaluation with probabilistic guarantees

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
On fast non-metric similarity search by metric access methods

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology

Efficient skyline retrieval with arbitrary similarity measures

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Efficient processing of exact top-k queries over disk-resident sorted lists

The VLDB Journal — The International Journal on Very Large Data Bases
Rights protection of trajectory datasets with nearest-neighbor preservation

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient RkNN retrieval with arbitrary non-metric similarity measures

Proceedings of the VLDB Endowment
Efficient reverse skyline retrieval with arbitrary non-metric similarity measures

Proceedings of the 14th International Conference on Extending Database Technology
Efficient similarity search: arbitrary similarity measures, arbitrary composition

Proceedings of the 20th ACM international conference on Information and knowledge management
Retrieving similar discussion forum threads: a structure based approach

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Cost-aware query planning for similarity search

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The top-k retrieval problem requires finding k objects most similar to a given query object. Similarities between objects are most often computed as aggregated similarities of their attribute values. We consider the case where the similarities between attribute values are arbitrary (non-metric), due to which standard space partitioning indexes cannot be used. Among the most popular techniques that can handle arbitrary similarity measures is the family of threshold algorithms. These were designed as middleware algorithms that assume that similarity lists for each attribute are available and focus on efficiently merging these lists to arrive at the results. In this paper, we explore multi-dimensional indexing of non-metric spaces that can lead to efficient pruning of the search space utilizing inter-attribute relationships, during top-k computation. We propose an indexing structure, the AL-Tree and an algorithm to do top-k retrieval using it in an online fashion. The ALTree exploits the fact that many real world attributes come from a small value space. We show that our algorithm performs much better than the threshold based algorithms in terms of computational cost due to efficient pruning of the search space. Further, it out-performs them in terms of IOs by upto an order of magnitude in case of dense datasets.