An algorithm for finding nearest neighbours in (approximately) constant average time
Pattern Recognition Letters
Redundancy in spatial databases
SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Vector approximation based indexing for non-uniform high dimensional data sets
Proceedings of the ninth international conference on Information and knowledge management
Signature files: an access method for documents and its analytical performance evaluation
ACM Transactions on Information Systems (TOIS)
Parallel traversal of signature trees for fast CBIR
MULTIMEDIA '01 Proceedings of the 2001 ACM workshops on Multimedia: multimedia information retrieval
ACM Computing Surveys (CSUR)
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
On Dimension Reduction Mappings for Approximate Retrieval of Multi-dimensional Data
Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
Properties of Embedding Methods for Similarity Searching in Metric Spaces
IEEE Transactions on Pattern Analysis and Machine Intelligence
Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
D-Index: Distance Searching Index for Metric Data Sets
Multimedia Tools and Applications
Pivot selection techniques for proximity searching in metric spaces
Pattern Recognition Letters
Index-driven similarity search in metric spaces (Survey Article)
ACM Transactions on Database Systems (TODS)
Image similarity search with compact data structures
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
A compact space decomposition for effective metric indexing
Pattern Recognition Letters
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Efficient filtering with sketches in the ferret toolkit
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
On approximate matching of programs for protecting libre software
CASCON '06 Proceedings of the 2006 conference of the Center for Advanced Studies on Collaborative research
Sizing sketches: a rank-based analysis for similarity search
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
SCAM '07 Proceedings of the Seventh IEEE International Working Conference on Source Code Analysis and Manipulation
Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Effective Proximity Retrieval by Ordering Permutations
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hi-index | 0.00 |
Sketches are compact bit string representations of objects. Objects that have the same sketch are stored in the same database bucket. By calculating the hamming distance of the sketches, an estimation of the similarity of their respective objects can be obtained. Objects that are close to each other are expected to have sketches with small hamming distance values. This estimation helps to schedule the order in which buckets are visited during search time. Recent research has shown that sketches can effectively approximate $L_1$ and $L_2$ distances in high dimensional settings. A remaining task is to provide a general sketch for arbitrary metric spaces. This paper presents a novel sketch based on generalized hyperplane partitioning that can be employed on arbitrary metric spaces. The core of the sketch is a heuristic that tries to generate balanced partitions. The indexing method AESA stores all the distances among database objects, and this allows it to perform a small number of distance computations. Experimental evaluations show that given a good early termination strategy, our algorithm performs up to one order of magnitude fewer distance operations than AESA in string spaces. Comparisons against other methods show greater gains. Furthermore, we experimentally demonstrate that it is possible to reduce the physical size of the sketches by a factor of ten with different run length encodings.