NV-Tree: An Efficient Disk-Based Index for Approximate Search in Very Large High-Dimensional Collections

Authors:
Herwig Lejsek;Fridrik Heidar Ásmundsson;Björn Pór Jónsson;Laurent Amsaleg
Affiliations:
Reykjavik University, Reykjavik;Reykjavik University, Reykjavik;Reykjavik University, Reykjavik;CNRS-IRISA, RENNES
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2009

Citing 0
Cited 15

High-dimensional descriptor indexing for large multimedia databases

Proceedings of the 17th ACM conference on Information and knowledge management
Videntifier™ forensic: a new law enforcement service for automatic identification of illegal video material

MiFor '09 Proceedings of the First ACM workshop on Multimedia in forensics
Videntifier™ forensic: robust and efficient detection of illegal multimedia

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Advanced Techniques in CBIR: Local Descriptors, Visual Dictionaries and Bags of Features

SIBGRAPI-TUTORIALS '09 Proceedings of the 2009 Tutorials of the XXII Brazilian Symposium on Computer Graphics and Image Processing
GPU acceleration of Eff2 descriptors using CUDA

Proceedings of the international conference on Multimedia
Understanding the security and robustness of SIFT

Proceedings of the international conference on Multimedia
Videntifier" Forensic: large-scale video identification in practice

Proceedings of the 2nd ACM workshop on Multimedia in forensics, security and intelligence
Deluding image recognition in sift-based cbir systems

Proceedings of the 2nd ACM workshop on Multimedia in forensics, security and intelligence
NV-Tree: nearest neighbors at the billion scale

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Active learning paradigms for CBIR systems based on optimum-path forest classification

Pattern Recognition
Incorporating multiple distance spaces in optimum-path forest classification to improve feedback-based learning

Computer Vision and Image Understanding
Impact of storage technology on the efficiency of cluster-based high-dimensional index creation

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications
Security-oriented picture-in-picture visual modifications

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Indexing and searching 100M images with map-reduce

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Online image search result grouping with MapReduce-based image clustering and graph construction for large-scale photos

Journal of Visual Communication and Image Representation

Quantified Score

Hi-index	0.14

Visualization

Abstract

Over the last two decades, much research effort has been spent on nearest neighbor search in high-dimensional data sets. Most of the approaches published thus far have, however, only been tested on rather small collections. When large collections have been considered, high-performance environments have been used, in particular systems with a large main memory. Accessing data on disk has largely been avoided because disk operations are considered to be too slow. It has been shown, however, that using large amounts of memory is generally not an economic choice. Therefore, we propose the NV-tree, which is a very efficient disk-based data structure that can give good approximate answers to nearest neighbor queries with a single disk operation, even for very large collections of high-dimensional data. Using a single NV-tree, the returned results have high recall but contain a number of false positives. By combining two or three NV-trees, most of those false positives can be avoided while retaining the high recall. Finally, we compare the NV-tree to Locality Sensitive Hashing, a popular method for \epsilon-distance search. We show that they return results of similar quality, but the NV-tree uses many fewer disk reads.