On the Estimation of 'Small' Probabilities by Leaving-One-Out

  • Authors:
  • Hermann Ney;Ute Essen;Reinhard Kneser

  • Affiliations:
  • -;-;-

  • Venue:
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • Year:
  • 1995

Quantified Score

Hi-index 0.14

Visualization

Abstract

In this paper, we apply the leaving-one-out concept to the estimation of 驴small驴 probabilities, i.e., the case where the number of training samples is much smaller than the number of possible classes. After deriving the Turing-Good formula in this framework, we introduce several specific models in order to avoid the problems of the original Turing-Good formula. These models are the constrained model, the absolute discounting model and the linear discounting model. These models are then applied to the problem of bigram-based stochastic language modeling. Experimental results are presented for a German and an English corpus.