A variable-length category-based n-gram language model

  • Authors:
  • T. R. Niesler;P. C. Woodland

  • Affiliations:
  • Dept. of Eng., Cambridge Univ., UK;Dept. of Eng., Cambridge Univ., UK

  • Venue:
  • ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

A language model based on word-category n-grams and ambiguous category membership with n increased selectively to trade compactness for performance is presented. The use of categories leads intrinsically to a compact model with the ability to generalise to unseen word sequences, and diminishes the sparseness of the training data, thereby making larger n feasible. The language model implicitly involves a statistical tagging operation, which may be used explicitly to assign category assignments to untagged text. Experiments on the LOB corpus show the optimal model-building strategy to yield improved results with respect to conventional n-gram methods, and when used as a tagger, the model is seen to perform well in relation to a standard benchmark.