The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
An optimal algorithm for approximate nearest neighbor searching fixed dimensions
Journal of the ACM (JACM)
Vector approximation based indexing for non-uniform high dimensional data sets
Proceedings of the ninth international conference on Information and knowledge management
An Algorithm for Finding Best Matches in Logarithmic Expected Time
ACM Transactions on Mathematical Software (TOMS)
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Empirical Evaluation of Dissimilarity Measures for Color and Texture
ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Foundations of probabilistic answers to queries
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search
ACM Transactions on Database Systems (TODS)
On approximating the smallest enclosing Bregman Balls
Proceedings of the twenty-second annual symposium on Computational geometry
Cover trees for nearest neighbor
ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning low-rank kernel matrices
ICML '06 Proceedings of the 23rd international conference on Machine learning
Similarity search: a matching based approach
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Clustering with Bregman Divergences
The Journal of Machine Learning Research
Efficient Similarity Search in Nonmetric Spaces with Local Constant Embedding
IEEE Transactions on Knowledge and Data Engineering
Fast nearest neighbor retrieval for bregman divergences
Proceedings of the 25th international conference on Machine learning
Graph partitioning based on link distributions
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Bridging the Gap: Query by Semantic Example
IEEE Transactions on Multimedia
Video Annotation Based on Kernel Linear Neighborhood Propagation
IEEE Transactions on Multimedia
Nearest neighbor search: algorithmic perspective
SIGSPATIAL Special
An efficient algorithm for reverse furthest neighbors query with metric index
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Efficient and effective similarity search over probabilistic data based on earth mover's distance
Proceedings of the VLDB Endowment
Schema-as-you-go: on probabilistic tagging and querying of wide tables
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Effective data co-reduction for multimedia similarity search
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient exact edit similarity query processing with the asymmetric signature scheme
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Approximate bregman near neighbors in sublinear time: beyond the triangle inequality
Proceedings of the twenty-eighth annual symposium on Computational geometry
Asymmetric signature schemes for efficient exact edit similarity query processing
ACM Transactions on Database Systems (TODS)
Hi-index | 0.00 |
In this paper, we examine the problem of indexing over non-metric distance functions. In particular, we focus on a general class of distance functions, namely Bregman Divergence [6], to support nearest neighbor and range queries. Distance functions such as KL-divergence and Itakura-Saito distance, are special cases of Bregman divergence, with wide applications in statistics, speech recognition and time series analysis among others. Unlike in metric spaces, key properties such as triangle inequality and distance symmetry do not hold for such distance functions. A direct adaptation of existing indexing infrastructure developed for metric spaces is thus not possible. We devise a novel solution to handle this class of distance measures by expanding and mapping points in the original space to a new extended space. Subsequently, we show how state-of-the-art tree-based indexing methods, for low to moderate dimensional datasets, and vector approximation file (VA-file) methods, for high dimensional datasets, can be adapted on this extended space to answer such queries efficiently. Improved distance bounding techniques and distribution-based index optimization are also introduced to improve the performance of query answering and index construction respectively, which can be applied on both the R-trees and VA files. Extensive experiments are conducted to validate our approach on a variety of datasets and a range of Bregman divergence functions.