Efficiently matching sets of features with random histograms

Authors:
Wei Dong;Zhe Wang;Moses Charikar;Kai Li
Affiliations:
Princeton University, Princeton, NJ, USA;Princeton University, Princeton, NJ, USA;Princeton University, Princeton, NJ, USA;Princeton University, Princeton, NJ, USA
Venue:
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Year:
2008

Citing 21
Cited 16

Min-wise independent permutations (extended abstract)

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
The Earth Mover's Distance as a Metric for Image Retrieval

International Journal of Computer Vision
Unsupervised Segmentation of Color-Texture Regions in Images and Video

IEEE Transactions on Pattern Analysis and Machine Intelligence
SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries

IEEE Transactions on Pattern Analysis and Machine Intelligence
Similarity estimation techniques from rounding algorithms

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Image similarity search with compact data structures

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 12 - Volume 12
Mercer Kernels for Object Recognition with Local Features

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Ferret: a toolkit for content-based similarity search of feature-rich data

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study

International Journal of Computer Vision
The Pyramid Match Kernel: Efficient Learning with Sets of Features

The Journal of Machine Learning Research
Practical elimination of near-duplicates from web video search

Proceedings of the 15th international conference on Multimedia
PCA-SIFT: a more distinctive representation for local image descriptors

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
A kernel between unordered sets of data: the Gaussian mixture approach

ECML'05 Proceedings of the 16th European conference on Machine Learning

An efficient key point quantization algorithm for large scale image retrieval

LS-MMRM '09 Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining
Query expansion for hash-based image object retrieval

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Scalable detection of partial near-duplicate videos by visual-temporal consistency

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Large-scale near-duplicate web video search: challenge and opportunity

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Feature map hashing: sub-linear indexing of appearance and global geometry

Proceedings of the international conference on Multimedia
Real-time large scale near-duplicate web video retrieval

Proceedings of the international conference on Multimedia
Interactive learning of heterogeneous visual concepts with local features

Proceedings of the international conference on Multimedia
Algorithm for detecting significant locations from raw GPS data

DS'10 Proceedings of the 13th international conference on Discovery science
Learning reconfigurable hashing for diverse semantics

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
BASIL: effective near-duplicate image detection using gene sequence alignment

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Scene signatures for unconstrained news video stories

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
High-confidence near-duplicate image detection

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Approximate gaussian mixtures for large scale vocabularies

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Non-parametric hand pose estimation with object context

Image and Vision Computing
Efficient Video Stream Monitoring for Near-Duplicate Detection and Localization in a Large-Scale Repository

ACM Transactions on Information Systems (TOIS)
Towards large-scale geometry indexing by feature selection

Computer Vision and Image Understanding

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the commonly used representation of a feature-rich data object has evolved from a single feature vector to a set of feature vectors, a key challenge in building a content-based search engine for feature-rich data is to match feature-sets efficiently. Although substantial progress has been made during the past few years, existing approaches are still inefficient and inflexible for building a search engine for massive datasets. This paper presents a randomized algorithm to embed a set of features into a single high-dimensional vector to simplify the feature-set matching problem. The main idea is to project feature vectors into an auxiliary space using locality sensitive hashing and to represent a set of features as a histogram in the auxiliary space. A histogram is simply a high dimensional vector, and efficient similarity measures like L1 and L2 distances can be employed to approximate feature-set distance measures. We evaluated the proposed approach under three different task settings, i.e. content-based image search, image object recognition and near-duplicate video clip detection. The experimental results show that the proposed approach is indeed effective and flexible. It can achieve accuracy comparable to the feature-set matching methods, while requiring significantly less space and time. For object recognition with Caltech 101 dataset, our method runs 25 times faster to achieve the same precision as Pyramid Matching Kernel, the state-of-the-art feature-set matching method.