Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fast Pose Estimation with Parameter-Sensitive Hashing
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Second Order Cone Programming Formulations for Feature Selection
The Journal of Machine Learning Research
Multi-probe LSH: efficient indexing for high-dimensional similarity search
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Towards a theoretical foundation for Laplacian-based manifold methods
Journal of Computer and System Sciences
International Journal of Approximate Reasoning
Multi-label boosting for image annotation by structural grouping sparsity
Proceedings of the international conference on Multimedia
Hi-index | 0.11 |
A better similarity index structure for high-dimensional feature datapoints is very desirable for building scalable content-based search systems on feature-rich dataset. In this paper, we introduce sparse principal component analysis (Sparse PCA) and Boosting Similarity Sensitive Hashing (Boosting SSC) into traditional spectral hashing for both effective and data-aware binary coding for real data. We call this Sparse Spectral Hashing (SSH). SSH formulates the problem of binary coding as a thresholding a subset of eigenvectors of the Laplacian graph by constraining the number of nonzero features. The convex relaxation and eigenfunction learning are conducted in SSH to make the coding globally optimal and effective to datapoints outside the training data. The comparisons in terms of F1 score and AUC show that SSH outperforms other methods substantially over both image and text datasets.