Improving part-of-speech tagging using lexicalized HMMs

  • Authors:
  • Ferran Pla;Antonio Molina

  • Affiliations:
  • Departament de Sistemes Informàtics i Computació, Universitat Politècnica de València, Camí de Vera, s/n. 46020 València SPAIN e-mail: fpla@dsci.upv.es;Departament de Sistemes Informàtics i Computació, Universitat Politècnica de València, Camí de Vera, s/n. 46020 València SPAIN e-mail: fpla@dsci.upv.es

  • Venue:
  • Natural Language Engineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce a simple method to build Lexicalized Hidden Markov Models (L-HMMs) for improving the precision of part-of-speech tagging. This technique enriches the contextual Language Model taking into account a set of selected words empirically obtained. The evaluation was conducted with different lexicalization criteria on the Penn Treebank corpus using the TnT tagger. This lexicalization obtained about a 6% reduction of the tagging error, on an unseen data test, without reducing the efficiency of the system. We have also studied how the use of linguistic resources, such as dictionaries and morphological analyzers, improves the tagging performance. Furthermore, we have conducted an exhaustive experimental comparison that shows that Lexicalized HMMs yield results which are better than or similar to other state-of-the-art part-of-speech tagging approaches. Finally, we have applied Lexicalized HMMs to the Spanish corpus LexEsp.