Approximate searches: k-neighbors + precision

Authors:
Sid-Ahmed Berrani;Laurent Amsaleg;Patrick Gros
Affiliations:
IRISA, Cesson-Sévigné France;CNRS -- IRISA, Rennes, France;CNRS -- IRISA, Rennes, France
Venue:
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Year:
2003

Citing 15
Cited 12

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Local Grayvalue Invariants for Image Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms

The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
Density-based indexing for approximate nearest-neighbor queries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
Searching Multimedia Databases by Content

Searching Multimedia Databases by Content
Clustering for Approximate Similarity Search in High-Dimensional Spaces

IEEE Transactions on Knowledge and Data Engineering
Similarity Indexing with the SS-tree

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Approximate Nearest Neighbor Searching in Multimedia Databases

Proceedings of the 17th International Conference on Data Engineering
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Contrast Plots and P-Sphere Trees: Space vs. Time in Nearest Neighbour Searches

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
PAC Nearest Neighbor Queries: Approximate and Controlled Search in High-Dimensional and Metric Spaces

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Deflating the Dimensionality Curse Using Multiple Fractal Dimensions

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Robust Object Recognition in Images and the Related Database Problems

Multimedia Tools and Applications

Robust content-based image searches for copyright protection

MMDB '03 Proceedings of the 1st ACM international workshop on Multimedia databases
Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
A fast shot matching strategy for detecting duplicate sequences in a television stream

Proceedings of the 2nd international workshop on Computer vision meets databases
Scalability of local image descriptors: a comparative study

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Robust detection of outliers for projection-based face recognition methods

Multimedia Tools and Applications
A probabilistic framework for fusing frame-based searches within a video copy detection system

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
A non-supervised approach for repeated sequence detection in TV broadcast streams

Image Communication
Efficient Processing of Nearest Neighbor Queries in Parallel Multimedia Databases

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Approximate similarity search: A multi-faceted problem

Journal of Discrete Algorithms
Finding the Nearest Neighbors in Biological Databases Using Less Distance Computations

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Approximate continuous K-nearest neighbor queries for uncertain objects in road networks

WAIM'11 Proceedings of the 12th international conference on Web-age information management
A data allocation method for efficient content-based retrieval in parallel multimedia databases

ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is known that all multi-dimensional index structures fail to accelerate content-based similarity searches when the feature vectors describing images are high-dimensional. It is possible to circumvent this problem by relying on approximate search-schemes trading-off result quality for reduced query execution time. Most approximate schemes, however, provide none or only complex control on the precision of the searches, especially when retrieving the k nearest neighbors (NNs) of query points.In contrast, this paper describes an approximate search scheme for high-dimensional databases where the precision of the search can be probabilistically controlled when retrieving the k NNs of query points. It allows a fine and intuitive control over this precision by setting at run time the maximum probability for a vector that would be in the exact answer set to be missed in the approximate set of answers eventually returned. This paper also presents a performance study of the implementation using real datasets showing its reliability and efficiency. It shows, for example, that our method is 6.72 times faster than the sequential scan when it handles more than 5 106 24-dimensional vectors, even when the probability of missing one of the true nearest neighbors is below 0.01.