Efficient filtering with sketches in the ferret toolkit

Authors:
Qin Lv;William Josephson;Zhe Wang;Moses Charikar;Kai Li
Affiliations:
Princeton University, Princeton, NJ;Princeton University, Princeton, NJ;Princeton University, Princeton, NJ;Princeton University, Princeton, NJ;Princeton University, Princeton, NJ
Venue:
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Year:
2006

Citing 21
Cited 3

Point location in arrangements of hyperplanes

Information and Computation
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Earth Mover's Distance as a Metric for Image Retrieval

International Journal of Computer Vision
Unsupervised Segmentation of Color-Texture Regions in Images and Video

IEEE Transactions on Pattern Analysis and Machine Intelligence
Searching in metric spaces

ACM Computing Surveys (CSUR)
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Rotation invariant spherical harmonic representation of 3D shape descriptors

Proceedings of the 2003 Eurographics/ACM SIGGRAPH symposium on Geometry processing
Navigating nets: simple algorithms for proximity search

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Image similarity search with compact data structures

Proceedings of the thirteenth ACM international conference on Information and knowledge management
The Active Vertice method: a performant filtering approach to high-dimensional indexing

Data & Knowledge Engineering
SnapFind: Brute Force Interactive Image Retrieval

ICIG '04 Proceedings of the Third International Conference on Image and Graphics
MyLifeBits: a personal database for everything

Communications of the ACM - Personal information management
Entropy based nearest neighbor search in high dimensions

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Cover trees for nearest neighbor

ICML '06 Proceedings of the 23rd international conference on Machine learning
Ferret: a toolkit for content-based similarity search of feature-rich data

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Three-dimensional shape searching: state-of-the-art review and future trends

Computer-Aided Design

Sizing sketches: a rank-based analysis for similarity search

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Efficient Similarity Search by Reducing I/O with Compressed Sketches

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ferret is a toolkit for building content-based similarity search systems for feature-rich data types such as audio, video, and digital photos.The key component of this toolkit is a content-based similarity search engine for generic, multi-feature object representations. This paper describes the filtering mechanism used in the Ferret toolkit and experimental results with several datasets. The filtering mechanism uses approximation algorithms to generate a candidate set, and then ranks the objects in the candidate set with a more sophisticated multi-feature distance measure. The paper compared two filtering methods: using segment feature vectors and sketches constructed from segment feature vectors. Our experimental results show that filtering can substantially speedup the search process and reduce memory requirement while maintaining good search quality. To help systems designers choose the filtering parameters, we have developed a rank-based analytical model for the filtering algorithm using sketches. Our experiments show that the model gives conservative and good prediction for different datasets.