Evaluation of TnT Tagger for Spanish

  • Authors:
  • Raúl Morales Carrasco;Alexander Gelbukh

  • Affiliations:
  • -;-

  • Venue:
  • ENC '03 Proceedings of the 4th Mexican International Conference on Computer Science
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Part of Speech (POS) tagger is a necessary module inmany natural language text processing tasks. A POS taggeris a program that accepts an unprepared raw text ininput and to each word adds a tag specifying its grammaticalproperties, such as part of speech, number, person,etc. One of popular POS taggers-TnT tagger-hasbeen extensively tested for English and some other languages.This paper reports on it evaluation for Spanishlanguage. Error analysis is reported, explaining howsome specific features of Spanish language affect taggerperformance. It is reported that on Spanish texts TnTshows overall tagging accuracy between 92.95% and95.84%, specifically, between 95.47% and 98.56% onknown words and between 75.57% and 83.49% on unknownwords. Results show that TnT has reached a goodlevel of maturity and is helpful enough for NLP tasks.