Tagging Portuguese with a Spanish tagger using cognates

  • Authors:
  • Jirka Hana;Anna Feldman;Chris Brew;Luiz Amaral

  • Affiliations:
  • The Ohio State University;The Ohio State University;The Ohio State University;The Ohio State University

  • Venue:
  • CrossLangInduction '06 Proceedings of the International Workshop on Cross-Language Knowledge Induction
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a knowledge and resource light system for an automatic morphological analysis and tagging of Brazilian Portuguese. We avoid the use of labor intensive resources; particularly, large annotated corpora and lexicons. Instead, we use (i) an annotated corpus of Peninsular Spanish, a language related to Portuguese, (ii) an unannotated corpus of Portuguese, (iii) a description of Portuguese morphology on the level of a basic grammar book. We extend the similar work that we have done (Hana et al., 2004; Feldman et al., 2006) by proposing an alternative algorithm for cognate transfer that effectively projects the Spanish emission probabilities into Portuguese. Our experiments use minimal new human effort and show 21% error reduction over even emissions on a fine-grained tagset.