Improving language model size reduction using better pruning criteria

Authors:
Jianfeng Gao;Min Zhang
Affiliations:
Microsoft Research, Asia, Beijing, China;Tsinghua University, China
Venue:
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Year:
2002

Citing 3
Cited 5

Self-organized language modeling for speech recognition

Readings in speech recognition
Toward a unified approach to statistical language modeling for Chinese

ACM Transactions on Asian Language Information Processing (TALIP)
Distribution-based pruning of backoff language models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics

Discriminative pruning of language models for Chinese word segmentation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
How many bits are needed to store probabilities for phrase-based translation?

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
A chinese mobile phone input method based on the dynamic and self-study language model

EUC'06 Proceedings of the 2006 international conference on Emerging Directions in Embedded and Ubiquitous Computing
The Left and Right Context of a Word: Overlapping Chinese Syllable Word Segmentation with Minimal Context

ACM Transactions on Asian Language Information Processing (TALIP)
Unsupervised language model adaptation for handwritten Chinese text recognition

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reducing language model (LM) size is a critical issue when applying a LM to realistic applications which have memory constraints. In this paper, three measures are studied for the purpose of LM pruning. They are probability, rank, and entropy. We evaluated the performance of the three pruning criteria in a real application of Chinese text input in terms of character error rate (CER). We first present an empirical comparison, showing that rank performs the best in most cases. We also show that the high-performance of rank lies in its strong correlation with error rate. We then present a novel method of combining two criteria in model pruning. Experimental results show that the combined criterion consistently leads to smaller models than the models pruned using either of the criteria separately, at the same CER.