A unified approach to learning task-specific bit vector representations for fast nearest neighbor search

Authors:
Vinod Nair;Dhruv Mahajan;Sundararajan Sellamanickam
Affiliations:
Yahoo! Labs Bangalore, Bangalore, India;Yahoo! Labs Bangalore, Bangalore, India;Yahoo! Labs Bangalore, Bangalore, India
Venue:
Proceedings of the 21st international conference on World Wide Web
Year:
2012

Citing 13
Cited 0

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
Adjustment Learning and Relevant Component Analysis

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Modeling relationships at multiple scales to improve accuracy of large recommender systems

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
On the local optimality of LambdaRank

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Distance Metric Learning for Large Margin Nearest Neighbor Classification

The Journal of Machine Learning Research
Large Scale Online Learning of Image Similarity Through Ranking

The Journal of Machine Learning Research
Compact hashing with joint optimization of search accuracy and time

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fast nearest neighbor search is necessary for a variety of large scale web applications such as information retrieval, nearest neighbor classification and nearest neighbor regression. Recently a number of machine learning algorithms have been proposed for representing the data to be searched as (short) bit vectors and then using hashing to do rapid search. These algorithms have been limited in their applicability in that they are suited for only one type of task -- e.g. Spectral Hashing learns bit vector representations for retrieval, but not say, classification. In this paper we present a unified approach to learning bit vector representations for many applications that use nearest neighbor search. The main contribution is a single learning algorithm that can be customized to learn a bit vector representation suited for the task at hand. This broadens the usefulness of bit vector representations to tasks beyond just conventional retrieval. We propose a learning-to-rank formulation to learn the bit vector representation of the data. LambdaRank algorithm is used for learning a function that computes a task-specific bit vector from an input data vector. Our approach outperforms state-of-the-art nearest neighbor methods on a number of real world text and image classification and retrieval datasets. It is scalable and learns a 32-bit representation on 1.46 million training cases in two days.