Semi-supervised learning of compact document representations with deep networks

Authors:
Marc' Aurelio Ranzato;Martin Szummer
Affiliations:
New York University, New York, NY;Microsoft Research Cambridge, Cambridge, UK
Venue:
Proceedings of the 25th international conference on Machine learning
Year:
2008

Citing 6
Cited 10

Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Effiicient BackProp

Neural Networks: Tricks of the Trade, this book is an outgrowth of a 1996 NIPS workshop
Latent dirichlet allocation

The Journal of Machine Learning Research
The rate adapting poisson model for information retrieval and object recognition

ICML '06 Proceedings of the 23rd international conference on Machine learning
A fast learning algorithm for deep belief nets

Neural Computation
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Large-scale deep unsupervised learning using graphics processors

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Learning Deep Architectures for AI

Foundations and Trends® in Machine Learning
Deep bottleneck classifiers in supervised dimension reduction

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part III
Active deep networks for semi-supervised sentiment classification

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
GPU-accelerated restricted boltzmann machine for collaborative filtering

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Computing text semantic relatedness using the contents and links of a hypertext encyclopedia

Artificial Intelligence
DTW-D: time series semi-supervised learning from a single example

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Nonparametric guidance of autoencoder representations using label information

The Journal of Machine Learning Research
Pedestrian detection based on kernel discriminative sparse representation

Transactions on Edutainment IX
Pattern classification and clustering: A review of partially supervised learning approaches

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding good representations of text documents is crucial in information retrieval and classification systems. Today the most popular document representation is based on a vector of word counts in the document. This representation neither captures dependencies between related words, nor handles synonyms or polysemous words. In this paper, we propose an algorithm to learn text document representations based on semi-supervised autoencoders that are stacked to form a deep network. The model can be trained efficiently on partially labeled corpora, producing very compact representations of documents, while retaining as much class information and joint word statistics as possible. We show that it is advantageous to exploit even a few labeled samples during training.