Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
The nature of statistical learning theory
The nature of statistical learning theory
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to algorithms
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Collaborative filtering via gaussian probabilistic latent semantic analysis
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The Journal of Machine Learning Research
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Inverted files for text search engines
ACM Computing Surveys (CSUR)
A fast learning algorithm for deep belief nets
Neural Computation
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Principles of hash-based text retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating topic models for information retrieval
Proceedings of the 17th ACM conference on Information and knowledge management
Collaborative Filtering for Implicit Feedback Datasets
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
International Journal of Approximate Reasoning
Self-taught hashing for fast similarity search
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
VSEncoding: efficient coding and fast decoding of integer lists via dynamic programming
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Efficient set intersection for inverted indexing
ACM Transactions on Information Systems (TOIS)
Composite hashing with multiple information sources
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Collaborative topic modeling for recommending scientific articles
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable inference in latent variable models
Proceedings of the fifth ACM international conference on Web search and data mining
Boosting multi-kernel locality-sensitive hashing for scalable image retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Supervised hashing with kernels
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Semi-Supervised Hashing for Large-Scale Search
IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning compact hashing codes for efficient tag completion and prediction
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Weighted hashing for fast large scale similarity search
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
It is an important research problem to design efficient and effective solutions for large scale similarity search. One popular strategy is to represent data examples as compact binary codes through semantic hashing, which has produced promising results with fast search speed and low storage cost. Many existing semantic hashing methods generate binary codes for documents by modeling document relationships based on similarity in a keyword feature space. Two major limitations in existing methods are: (1) Tag information is often associated with documents in many real world applications, but has not been fully exploited yet; (2) The similarity in keyword feature space does not fully reflect semantic relationships that go beyond keyword matching. This paper proposes a novel hashing approach, Semantic Hashing using Tags and Topic Modeling (SHTTM), to incorporate both the tag information and the similarity information from probabilistic topic modeling. In particular, a unified framework is designed for ensuring hashing codes to be consistent with tag information by a formal latent factor model and preserving the document topic/semantic similarity that goes beyond keyword matching. An iterative coordinate descent procedure is proposed for learning the optimal hashing codes. An extensive set of empirical studies on four different datasets has been conducted to demonstrate the advantages of the proposed SHTTM approach against several other state-of-the-art semantic hashing techniques. Furthermore, experimental results indicate that the modeling of tag information and utilizing topic modeling are beneficial for improving the effectiveness of hashing separately, while the combination of these two techniques in the unified framework obtains even better results.