BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Local Grayvalue Invariants for Image Retrieval
IEEE Transactions on Pattern Analysis and Machine Intelligence
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
Density-based indexing for approximate nearest-neighbor queries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
ACM Computing Surveys (CSUR)
Searching Multimedia Databases by Content
Searching Multimedia Databases by Content
Clustering for Approximate Similarity Search in High-Dimensional Spaces
IEEE Transactions on Knowledge and Data Engineering
Similarity Indexing with the SS-tree
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Approximate Nearest Neighbor Searching in Multimedia Databases
Proceedings of the 17th International Conference on Data Engineering
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Contrast Plots and P-Sphere Trees: Space vs. Time in Nearest Neighbour Searches
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Deflating the Dimensionality Curse Using Multiple Fractal Dimensions
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Robust Object Recognition in Images and the Related Database Problems
Multimedia Tools and Applications
Robust content-based image searches for copyright protection
MMDB '03 Proceedings of the 1st ACM international workshop on Multimedia databases
Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
A fast shot matching strategy for detecting duplicate sequences in a television stream
Proceedings of the 2nd international workshop on Computer vision meets databases
Scalability of local image descriptors: a comparative study
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Robust detection of outliers for projection-based face recognition methods
Multimedia Tools and Applications
A probabilistic framework for fusing frame-based searches within a video copy detection system
CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Efficient Processing of Nearest Neighbor Queries in Parallel Multimedia Databases
DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Approximate similarity search: A multi-faceted problem
Journal of Discrete Algorithms
Finding the Nearest Neighbors in Biological Databases Using Less Distance Computations
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Approximate continuous K-nearest neighbor queries for uncertain objects in road networks
WAIM'11 Proceedings of the 12th international conference on Web-age information management
A data allocation method for efficient content-based retrieval in parallel multimedia databases
ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
Hi-index | 0.00 |
It is known that all multi-dimensional index structures fail to accelerate content-based similarity searches when the feature vectors describing images are high-dimensional. It is possible to circumvent this problem by relying on approximate search-schemes trading-off result quality for reduced query execution time. Most approximate schemes, however, provide none or only complex control on the precision of the searches, especially when retrieving the k nearest neighbors (NNs) of query points.In contrast, this paper describes an approximate search scheme for high-dimensional databases where the precision of the search can be probabilistically controlled when retrieving the k NNs of query points. It allows a fine and intuitive control over this precision by setting at run time the maximum probability for a vector that would be in the exact answer set to be missed in the approximate set of answers eventually returned. This paper also presents a performance study of the implementation using real datasets showing its reliability and efficiency. It shows, for example, that our method is 6.72 times faster than the sequential scan when it handles more than 5 106 24-dimensional vectors, even when the probability of missing one of the true nearest neighbors is below 0.01.