Efficient subsampling for training complex language models

Authors:
Puyang Xu;Asela Gunawardana;Sanjeev Khudanpur
Affiliations:
Johns Hopkins University, Baltimore, MD;Microsoft Research, Redmond, WA;Johns Hopkins University, Baltimore, MD
Venue:
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2011

Citing 7
Cited 1

A maximum entropy approach to natural language processing

Computational Linguistics
Reducing multiclass to binary: a unifying approach for margin classifiers

The Journal of Machine Learning Research
A neural probabilistic language model

The Journal of Machine Learning Research
In Defense of One-Vs-All Classification

The Journal of Machine Learning Research
Solving large scale linear prediction problems using stochastic gradient descent algorithms

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Continuous space language models

Computer Speech and Language
Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model

IEEE Transactions on Neural Networks

Large, pruned or continuous space language models on a GPU for statistical machine translation

WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose an efficient way to train maximum entropy language models (MELM) and neural network language models (NNLM). The advantage of the proposed method comes from a more robust and efficient subsampling technique. The original multi-class language modeling problem is transformed into a set of binary problems where each binary classifier predicts whether or not a particular word will occur. We show that the binarized model is as powerful as the standard model and allows us to aggressively subsample negative training examples without sacrificing predictive performance. Empirical results show that we can train MELM and NNLM at 1% ~ 5% of the standard complexity with no loss in performance.