Sparse spectral hashing

Authors:
Jian Shao;Fei Wu;Chuanfei Ouyang;Xiao Zhang
Affiliations:
College of Computer Science, Zhejiang University, China;College of Computer Science, Zhejiang University, China;College of Computer Science, Zhejiang University, China;College of Computer Science, Zhejiang University, China
Venue:
Pattern Recognition Letters
Year:
2012

Citing 8
Cited 1

Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fast Pose Estimation with Parameter-Sensitive Hashing

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Second Order Cone Programming Formulations for Feature Selection

The Journal of Machine Learning Research
Multi-probe LSH: efficient indexing for high-dimensional similarity search

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Towards a theoretical foundation for Laplacian-based manifold methods

Journal of Computer and System Sciences
Semantic hashing

International Journal of Approximate Reasoning
Multi-label boosting for image annotation by structural grouping sparsity

Proceedings of the international conference on Multimedia

Large-scale image retrieval based on boosting iterative quantization hashing with query-adaptive reranking

Neurocomputing

Quantified Score

Hi-index	0.11

Visualization

Abstract

A better similarity index structure for high-dimensional feature datapoints is very desirable for building scalable content-based search systems on feature-rich dataset. In this paper, we introduce sparse principal component analysis (Sparse PCA) and Boosting Similarity Sensitive Hashing (Boosting SSC) into traditional spectral hashing for both effective and data-aware binary coding for real data. We call this Sparse Spectral Hashing (SSH). SSH formulates the problem of binary coding as a thresholding a subset of eigenvectors of the Laplacian graph by constraining the number of nonzero features. The convex relaxation and eigenfunction learning are conducted in SSH to make the coding globally optimal and effective to datapoints outside the training data. The comparisons in terms of F1 score and AUC show that SSH outperforms other methods substantially over both image and text datasets.