Training conditional random fields using incomplete annotations

Authors:
Yuta Tsuboi;Hisashi Kashima;Hiroki Oda;Shinsuke Mori;Yuji Matsumoto
Affiliations:
IBM Research, IBM Japan, Ltd, Yamato, Kanagawa, Japan;IBM Research, IBM Japan, Ltd, Yamato, Kanagawa, Japan;Shinagawa, Tokyo, Japan;Kyoto University, Sakyo-ku, Kyoto, Japan;Nara Institute of Science and Technology, Ikoma, Nara, Japan
Venue:
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Year:
2008

Citing 9
Cited 6

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Inside-outside reestimation from partially bracketed corpora

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A maximum entropy Chinese character-based parser

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Chinese segmentation and new word detection using conditional random fields

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A discriminative matching approach to word alignment

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Partial training for a lexicalized-grammar parser

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Corrective feedback and persistent learning for information extraction

Artificial Intelligence

An efficient algorithm for unsupervised word segmentation with branching entropy and MDL

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Pointwise prediction for robust, adaptable Japanese morphological analysis

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Aspects of semi-supervised and active learning in conditional random fields

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Non-parametric bayesian segmentation of Japanese noun phrases

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Exploiting partial annotations with EM training

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Learning thread reply structure on patient forums

Proceedings of the 2013 international workshop on Data management & analytics for healthcare

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address corpus building situations, where complete annotations to the whole corpus is time consuming and unrealistic. Thus, annotation is done only on crucial part of sentences, or contains unresolved label ambiguities. We propose a parameter estimation method for Conditional Random Fields (CRFs), which enables us to use such incomplete annotations. We show promising results of our method as applied to two types of NLP tasks: a domain adaptation task of a Japanese word segmentation using partial annotations, and a part-of-speech tagging task using ambiguous tags in the Penn treebank corpus.