Tagging English text with a probabilistic model

  • Authors:
  • Bernard Merialdo

  • Affiliations:
  • Institut EURECOM

  • Venue:
  • Computational Linguistics
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present some experiments on the use of a probabilistic model to tag English text, i.e. to assign to each word the correct tag (part of speech) in the context of the sentence. The main novelty of these experiments is the use of untagged text in the training of the model. We have used a simple triclass Markov model and are looking for the best way to estimate the parameters of this model, depending on the kind and amount of training data provided. Two approaches in particular are compared and combined:using text that has been tagged by hand and computing relative frequency counts,using text without tags and training the model as a hidden Markov process, according to a Maximum Likelihood principle.Experminents show that the best training is obtained by using as much tagged text as possible. They also show that Maximum Likelihood training, the procedure that is routinely used to estimate hidden Markov models parameters from training data, will not necessarily improve the tagging accuracy. In fact, it will generally degrade this accuracy, except when only a limited amount of hand-tagged text is available.