Discriminative n-gram language modeling

Authors:
Brian Roark;Murat Saraclar;Michael Collins
Affiliations:
Center for Spoken Language Understanding, OGI School of Science and Engineering at Oregon Health and Science University, 20000 NW Walker Road, Beaverton, OR 97006, United States;Boğaziçi University, 34342 Bebek, Istanbul, Turkey;MIT CSAIL/EECS Stata Center, Building 32-G484, Cambridge, MA 02139, United States
Venue:
Computer Speech and Language
Year:
2007

Citing 15
Cited 17

Large Margin Classification Using the Perceptron Algorithm

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Table extraction using conditional random fields

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Estimators for stochastic "Unification-Based" grammars

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Generalized algorithms for constructing statistical language models

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A comparison of algorithms for maximum entropy parameter estimation

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Parameter estimation for statistical parsing models: theory and practice of distribution-free methods

New developments in parsing technology
Discriminative language modeling with conditional random fields and the perceptron algorithm

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Discriminative syntactic language modeling for speech recognition

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Error correction via a post-processor for continuous speech recognition

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Language model adaptation with MAP estimation and the perceptron algorithm

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
MAP adaptation of stochastic grammars

Computer Speech and Language

Discriminative syntactic language modeling for speech recognition

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
The application of hidden Markov models in speech recognition

Foundations and Trends in Signal Processing
Large-Scale Statistical Machine Translation with Weighted Finite State Transducers

Proceedings of the 2009 conference on Finite-State Methods and Natural Language Processing: Post-proceedings of the 7th International Workshop FSMNLP 2008
Refining generative language models using discriminative learning

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Mining and modeling relations between formal and informal Chinese phrases from web corpora

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Tied-mixture language modeling in continuous space

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
N-best reranking by multitask learning

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Lexicographic semirings for exact automata encoding of sequence models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Confidence-weighted learning of factored discriminative language models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
A scalable probabilistic classifier for language modeling

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Joint reranking of parsing and word recognition with automatic segmentation

Computer Speech and Language
Acoustically discriminative language model training with pseudo-hypothesis

Speech Communication
Syntactic discriminative language model rerankers for statistical machine translation

Machine Translation
Efficient training of discriminative language models by sample selection

Speech Communication
Empirical comparisons of various discriminative language models for speech recognition

ROCLING '11 Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing
Fast syntactic analysis for statistical language modeling via substructure sharing and uptraining

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Extractive speech summarization using evaluation metric-related training criteria

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on maximizing the regularized conditional log-likelihood. The models are encoded as deterministic weighted finite state automata, and are applied by intersecting the automata with word-lattices that are the output from a baseline recognizer. The perceptron algorithm has the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data. We describe a method based on regularized likelihood that makes use of the feature set given by the perceptron algorithm, and initialization with the perceptron's weights; this method gives an additional 0.5% reduction in word error rate (WER) over training with the perceptron alone. The final system achieves a 1.8% absolute reduction in WER for a baseline first-pass recognition system (from 39.2% to 37.4%), and a 0.9% absolute reduction in WER for a multi-pass recognition system (from 28.9% to 28.0%).