Comparing two markov methods for part-of-speech tagging of portuguese

  • Authors:
  • Fábio N. Kepler;Marcelo Finger

  • Affiliations:
  • Institute of Mathematics and Statistics, University of São Paulo (USP);Institute of Mathematics and Statistics, University of São Paulo (USP)

  • Venue:
  • IBERAMIA-SBIA'06 Proceedings of the 2nd international joint conference, and Proceedings of the 10th Ibero-American Conference on AI 18th Brazilian conference on Advances in Artificial Intelligence
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

There is a wide variety of statistical methods applied to Part-of-Speech (PoS) tagging, that associate words in a text to their corresponding PoS. The majority of those methods analyse a fixed, small neighborhood of words imposing some form of Markov restriction. In this work we implement and compare a fixed length hidden Markov model (HMM) with a variable length Markov chain (VLMC); the latter is, in principle, capable of detecting long distance dependencies. We show that the VLMC model performs better in terms of accuracy and almost equally in terms of tagging time, also doing very well in training time. However, the VLMC method actually fails to capture really long distance dependencies, and we analyse the reasons for such behaviour.