Tiered Tagging and Combined Language Models Classifiers

  • Authors:
  • Dan Tufis

  • Affiliations:
  • -

  • Venue:
  • TSD '99 Proceedings of the Second International Workshop on Text, Speech and Dialogue
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the problem of morpho-syntactic disambiguation of arbitrary texts in a highly inflectional natural language. We use a large tagset (615 tags), EAGLES and MULTEXT compliant [5]. The large tagset is internally mapped onto a reduced one (82 tags), serving statistical disambiguation, and a text disambiguated in terms of this tagset is subsequently subject to a recovery process of all the information left out from the large tagset. This two step process is called tiered tagging. To further improve the tagging accuracy we use a combined language models classifier, a procedure that interpolates the results of tagging the same text with several register-specific language models.