Semantic hashing

Authors:
Ruslan Salakhutdinov;Geoffrey Hinton
Affiliations:
Department of Computer Science, University of Toronto, 6 King's College Road, Toronto, Ontario, Canada M5S 3G4;Department of Computer Science, University of Toronto, 6 King's College Road, Toronto, Ontario, Canada M5S 3G4
Venue:
International Journal of Approximate Reasoning
Year:
2009

Citing 12
Cited 55

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
A view of the EM algorithm that justifies incremental, sparse, and other variants

Learning in graphical models
Training products of experts by minimizing contrastive divergence

Neural Computation
An Algroithm for Finding Best Matches in Logarithmic Expected Time

An Algroithm for Finding Best Matches in Logarithmic Expected Time
Latent dirichlet allocation

The Journal of Machine Learning Research
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Learning a Similarity Metric Discriminatively, with Application to Face Verification

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
The rate adapting poisson model for information retrieval and object recognition

ICML '06 Proceedings of the 23rd international conference on Machine learning
A fast learning algorithm for deep belief nets

Neural Computation
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Sketching Algorithms for Approximating Rank Correlations in Collaborative Filtering Systems

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Supervised semantic indexing

Proceedings of the 18th ACM conference on Information and knowledge management
Ontologies and semantic mining for bio-technology and chemistry data and patents

Proceedings of the 2nd international workshop on Patent information retrieval
Sketching techniques for collaborative filtering

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Similarity search and locality sensitive hashing using ternary content addressable memories

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Self-taught hashing for fast similarity search

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Modeling semantic relevance for question-answer pairs in web social communities

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Hashing-based approaches to spelling correction of personal names

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Binary coherent edge descriptors

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Fast approximate nearest neighbor methods for non-Euclidean manifolds with applications to human activity analysis in videos

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
CarPal: interconnecting overlay networks for a community-driven shared mobility

TGC'10 Proceedings of the 5th international conference on Trustworthly global computing
Error-correcting output hashing in fast similarity search

ICIMCS '10 Proceedings of the Second International Conference on Internet Multimedia Computing and Service
A novel text classification approach based on deep belief network

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I
Building a multi-FPGA virtualized restricted boltzmann machine architecture using embedded MPI

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Learning reconfigurable hashing for diverse semantics

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Discriminative deep belief networks for visual data classification

Pattern Recognition
Coding of Image Feature Descriptors for Distributed Rate-efficient Visual Correspondences

International Journal of Computer Vision
Composite hashing with multiple information sources

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Deep Learning Approaches to Semantic Relevance Modeling for Chinese Question-Answer Pairs

ACM Transactions on Asian Language Information Processing (TALIP)
Efficient approximate similarity search using random projection learning

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Sentiment classification based on supervised latent n-gram analysis

Proceedings of the 20th ACM international conference on Information and knowledge management
Probabilistic near-duplicate detection using simhash

Proceedings of the 20th ACM international conference on Information and knowledge management
Sparse spectral hashing

Pattern Recognition Letters
Data mining from a patient safety database: the lessons learned

Data Mining and Knowledge Discovery
Laplacian co-hashing of terms and documents

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Is simhash achilles?

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
PLISS: labeling places using online changepoint detection

Autonomous Robots
Learning hash functions for cross-view similarity search

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Learning binary codes for collaborative filtering

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Manhattan hashing for large-scale image retrieval

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Thick boundaries in binary space and their influence on nearest-neighbor search

Pattern Recognition Letters
Semi-supervised spectral hashing for fast similarity search

Neurocomputing
Unsupervised and supervised visual codes with restricted boltzmann machines

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Attribute discovery via predictable discriminative binary codes

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Fast near neighbor search in high-dimensional binary data

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Image retrieval with query-adaptive hashing

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Active hashing and its application to image and text retrieval

Data Mining and Knowledge Discovery
Hashing with cauchy graph

PCM'12 Proceedings of the 13th Pacific-Rim conference on Advances in Multimedia Information Processing
Nonnegative sparse coding induced hashing for image copy detection

Neurocomputing
Exploiting deep neural networks for detection-based speech recognition

Neurocomputing
Sparse hashing for fast multimedia search

ACM Transactions on Information Systems (TOIS)
Dual local consistency hashing with discriminative projections selection

Signal Processing
Least square regularized spectral hashing for similarity search

Signal Processing
Effective hashing for large-scale multimedia search

Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
Semantic hashing using tags and topic modeling

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Approximate nearest neighbor search to support manual image annotation of large domain-specific datasets

Proceedings of the International Workshop on Video and Image Ground Truth in Computer Vision Applications
Topology preserving hashing for similarity search

Proceedings of the 21st ACM international conference on Multimedia
Order preserving hashing for approximate nearest neighbor search

Proceedings of the 21st ACM international conference on Multimedia
Linear cross-modal hashing for efficient multimedia search

Proceedings of the 21st ACM international conference on Multimedia
Online multimodal deep similarity learning with application to image retrieval

Proceedings of the 21st ACM international conference on Multimedia
Latent feature learning in social media network

Proceedings of the 21st ACM international conference on Multimedia
Learning compact hashing codes for efficient tag completion and prediction

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Weighted hashing for fast large scale similarity search

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hypergraph Spectral Hashing for image retrieval with heterogeneous social contexts

Neurocomputing
Theoretical aspects of mapping to multidimensional optimal regions as a multi-classifier

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We show how to learn a deep graphical model of the word-count vectors obtained from a large set of documents. The values of the latent variables in the deepest layer are easy to infer and give a much better representation of each document than Latent Semantic Analysis. When the deepest layer is forced to use a small number of binary variables (e.g. 32), the graphical model performs ''semantic hashing'': Documents are mapped to memory addresses in such a way that semantically similar documents are located at nearby addresses. Documents similar to a query document can then be found by simply accessing all the addresses that differ by only a few bits from the address of the query document. This way of extending the efficiency of hash-coding to approximate matching is much faster than locality sensitive hashing, which is the fastest current method. By using semantic hashing to filter the documents given to TF-IDF, we achieve higher accuracy than applying TF-IDF to the entire document set.