Portuguese Part-of-Speech Tagging Using Entropy Guided Transformation Learning

  • Authors:
  • Cícero Nogueira Dos Santos;Ruy L. Milidiú;Raúl P. Rentería

  • Affiliations:
  • Departamento de Informática, Pontifícia Universidade Católica, Rio de Janeiro, Brazil;Departamento de Informática, Pontifícia Universidade Católica, Rio de Janeiro, Brazil;Fast Search & Transfer,

  • Venue:
  • PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

Entropy Guided Transformation Learning (ETL) is a new machine learning strategy that combines the advantages of Decision Trees (DT) and Transformation Based Learning (TBL). In this work, we apply the ETL framework to Portuguese Part-of-Speech Taggging. We use two different corpora: Mac-Morpho and Tycho Brahae. ETL achieves the best results reported so far for Machine Learning based POS tagging of both corpora. ETL provides a new training strategy that accelerates transformation learning. For the Mac-Morpho corpus this corresponds to a factor of three speedup. ETL shows accuracies of 96.75% and 96.64% for Mac-Morpho and Tycho Brahae, respectively.