Learning morpho-lexical probabilities from an untagged corpus with an application to Hebrew

  • Authors:
  • Moshe Levinger;Alon Itai;Uzzi Ornan

  • Affiliations:
  • Haifa Research Laboratory;Technion;Technion

  • Venue:
  • Computational Linguistics
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a new approach for acquiring morpho-lexical probabilities from an untagged corpus. This approach demonstrates a way to extract very useful and nontrivial information from an untagged corpus, which otherwise would require laborious tagging of large corpora. The paper describes the use of these morpho-lexical probabilities as an information source for morphological disambiguation in Hebrew. The suggested method depends primarily on the following property: a lexical entry in Hebrew may have many different word forms, some of which are ambiguous and some of which are not. Thus, the disambiguation of a given word can be achieved using other word forms of the same lexical entry. Even though it was originally devised and implemented for dealing with the morphological ambiguity problem in Hebrew, the basic idea can be extended and used to handle similar problems in other languages with rich morphology.