Similarity search on Bregman divergence: towards non-metric indexing

Authors:
Zhenjie Zhang;Beng Chin Ooi;Srinivasan Parthasarathy;Anthony K. H. Tung
Affiliations:
National U. of Singapore;National U. of Singapore;Ohio State University;National U. of Singapore
Venue:
Proceedings of the VLDB Endowment
Year:
2009

Citing 21
Cited 8

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
An optimal algorithm for approximate nearest neighbor searching fixed dimensions

Journal of the ACM (JACM)
Vector approximation based indexing for non-uniform high dimensional data sets

Proceedings of the ninth international conference on Information and knowledge management
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Empirical Evaluation of Dissimilarity Measures for Color and Texture

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
An Efficient Image Similarity Measure Based on Approximations of KL-Divergence Between Two Gaussian Mixtures

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Foundations of probabilistic answers to queries

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
On approximating the smallest enclosing Bregman Balls

Proceedings of the twenty-second annual symposium on Computational geometry
Cover trees for nearest neighbor

ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning low-rank kernel matrices

ICML '06 Proceedings of the 23rd international conference on Machine learning
Similarity search: a matching based approach

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Clustering with Bregman Divergences

The Journal of Machine Learning Research
Efficient Similarity Search in Nonmetric Spaces with Local Constant Embedding

IEEE Transactions on Knowledge and Data Engineering
Fast nearest neighbor retrieval for bregman divergences

Proceedings of the 25th international conference on Machine learning
Graph partitioning based on link distributions

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Bridging the Gap: Query by Semantic Example

IEEE Transactions on Multimedia
Video Annotation Based on Kernel Linear Neighborhood Propagation

IEEE Transactions on Multimedia

Nearest neighbor search: algorithmic perspective

SIGSPATIAL Special
An efficient algorithm for reverse furthest neighbors query with metric index

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Efficient and effective similarity search over probabilistic data based on earth mover's distance

Proceedings of the VLDB Endowment
Schema-as-you-go: on probabilistic tagging and querying of wide tables

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Effective data co-reduction for multimedia similarity search

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient exact edit similarity query processing with the asymmetric signature scheme

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Approximate bregman near neighbors in sublinear time: beyond the triangle inequality

Proceedings of the twenty-eighth annual symposium on Computational geometry
Asymmetric signature schemes for efficient exact edit similarity query processing

ACM Transactions on Database Systems (TODS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we examine the problem of indexing over non-metric distance functions. In particular, we focus on a general class of distance functions, namely Bregman Divergence [6], to support nearest neighbor and range queries. Distance functions such as KL-divergence and Itakura-Saito distance, are special cases of Bregman divergence, with wide applications in statistics, speech recognition and time series analysis among others. Unlike in metric spaces, key properties such as triangle inequality and distance symmetry do not hold for such distance functions. A direct adaptation of existing indexing infrastructure developed for metric spaces is thus not possible. We devise a novel solution to handle this class of distance measures by expanding and mapping points in the original space to a new extended space. Subsequently, we show how state-of-the-art tree-based indexing methods, for low to moderate dimensional datasets, and vector approximation file (VA-file) methods, for high dimensional datasets, can be adapted on this extended space to answer such queries efficiently. Improved distance bounding techniques and distribution-based index optimization are also introduced to improve the performance of query answering and index construction respectively, which can be applied on both the R-trees and VA files. Extensive experiments are conducted to validate our approach on a variety of datasets and a range of Bregman divergence functions.