An error-driven word-character hybrid model for joint Chinese word segmentation and POS tagging

  • Authors:
  • Canasai Kruengkrai;Kiyotaka Uchimoto;Jun'ichi Kazama;Yiou Wang;Kentaro Torisawa;Hitoshi Isahara

  • Affiliations:
  • Kobe University, Nada-ku, Kobe Japan and National Institute of Information and Communications Technology, Seika-cho, Soraku-gun, Kyoto Japan;National Institute of Information and Communications Technology, Seika-cho, Soraku-gun, Kyoto Japan;National Institute of Information and Communications Technology, Seika-cho, Soraku-gun, Kyoto Japan;National Institute of Information and Communications Technology, Seika-cho, Soraku-gun, Kyoto Japan;National Institute of Information and Communications Technology, Seika-cho, Soraku-gun, Kyoto Japan;Kobe University, Nada-ku, Kobe Japan and National Institute of Information and Communications Technology, Seika-cho, Soraku-gun, Kyoto Japan

  • Venue:
  • ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging. Our word-character hybrid model offers high performance since it can handle both known and unknown words. We describe our strategies that yield good balance for learning the characteristics of known and unknown words and propose an error-driven policy that delivers such balance by acquiring examples of unknown words from particular errors in a training corpus. We describe an efficient framework for training our model based on the Margin Infused Relaxed Algorithm (MIRA), evaluate our approach on the Penn Chinese Treebank, and show that it achieves superior performance compared to the state-of-the-art approaches reported in the literature.