Training conditional random fields using incomplete annotations

  • Authors:
  • Yuta Tsuboi;Hisashi Kashima;Hiroki Oda;Shinsuke Mori;Yuji Matsumoto

  • Affiliations:
  • IBM Research, IBM Japan, Ltd, Yamato, Kanagawa, Japan;IBM Research, IBM Japan, Ltd, Yamato, Kanagawa, Japan;Shinagawa, Tokyo, Japan;Kyoto University, Sakyo-ku, Kyoto, Japan;Nara Institute of Science and Technology, Ikoma, Nara, Japan

  • Venue:
  • COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address corpus building situations, where complete annotations to the whole corpus is time consuming and unrealistic. Thus, annotation is done only on crucial part of sentences, or contains unresolved label ambiguities. We propose a parameter estimation method for Conditional Random Fields (CRFs), which enables us to use such incomplete annotations. We show promising results of our method as applied to two types of NLP tasks: a domain adaptation task of a Japanese word segmentation using partial annotations, and a part-of-speech tagging task using ambiguous tags in the Penn treebank corpus.