A cost sensitive part-of-speech tagging: differentiating serious errors from minor errors

  • Authors:
  • Hyun-Je Song;Jeong-Woo Son;Tae-Gil Noh;Seong-Bae Park;Sang-Jo Lee

  • Affiliations:
  • Kyungpook Nat'l Univ., Daegu, Korea;Kyungpook Nat'l Univ., Daegu, Korea;Heidelberg University, Heidelberg, Germany;Kyungpook Nat'l Univ., Daegu, Korea, and University of Illinois at Chicago;Kyungpook Nat'l Univ., Daegu, Korea

  • Venue:
  • ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

All types of part-of-speech (POS) tagging errors have been equally treated by existing taggers. However, the errors are not equally important, since some errors affect the performance of subsequent natural language processing (NLP) tasks seriously while others do not. This paper aims to minimize these serious errors while retaining the overall performance of POS tagging. Two gradient loss functions are proposed to reflect the different types of errors. They are designed to assign a larger cost to serious errors and a smaller one to minor errors. Through a set of POS tagging experiments, it is shown that the classifier trained with the proposed loss functions reduces serious errors compared to state-of-the-art POS taggers. In addition, the experimental result on text chunking shows that fewer serious errors help to improve the performance of subsequent NLP tasks.