An error-driven word-character hybrid model for joint Chinese word segmentation and POS tagging

Authors:
Canasai Kruengkrai;Kiyotaka Uchimoto;Jun'ichi Kazama;Yiou Wang;Kentaro Torisawa;Hitoshi Isahara
Affiliations:
Kobe University, Nada-ku, Kobe Japan and National Institute of Information and Communications Technology, Seika-cho, Soraku-gun, Kyoto Japan;National Institute of Information and Communications Technology, Seika-cho, Soraku-gun, Kyoto Japan;National Institute of Information and Communications Technology, Seika-cho, Soraku-gun, Kyoto Japan;National Institute of Information and Communications Technology, Seika-cho, Soraku-gun, Kyoto Japan;National Institute of Information and Communications Technology, Seika-cho, Soraku-gun, Kyoto Japan;Kobe University, Nada-ku, Kobe Japan and National Institute of Information and Communications Technology, Seika-cho, Soraku-gun, Kyoto Japan
Venue:
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Year:
2009

Citing 15
Cited 21

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Estimating lexical priors for low-frequency morphologically ambiguous forms

Computational Linguistics
Parallel Optimization: Theory, Algorithms and Applications

Parallel Optimization: Theory, Algorithms and Applications
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A stochastic Japanese morphological analyzer using a forward-DP backward-A* N-best search algorithm

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
A part of speech estimation method for Japanese unknown words using a statistical model of morphology and context

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
The first international Chinese word segmentation Bakeoff

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Hownet And the Computation of Meaning

Hownet And the Computation of Meaning
Chinese and Japanese word segmentation using word-level and character-level information

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Non-projective dependency parsing using spanning tree algorithms

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Discriminative learning and spanning tree algorithms for dependency parsing

Discriminative learning and spanning tree algorithms for dependency parsing
A hybrid approach to word segmentation and POS tagging

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Word lattice reranking for Chinese word segmentation and part-of-speech tagging

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
A dual-layer CRFs based joint decoding method for cascaded segmentation and labeling tasks

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Bitext dependency parsing with bilingual subtree constraints

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Filtering syntactic constraints for statistical machine translation

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
A fast decoder for joint word segmentation and POS-tagging using a single discriminative model

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Jointly modeling WSD and SRL with Markov logic

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Syntactic processing using the generalized perceptron and beam search

Computational Linguistics
A stacked sub-word model for joint Chinese word segmentation and part-of-speech tagging

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Parsing the internal structure of words: a new paradigm for Chinese word segmentation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Pointwise prediction for robust, adaptable Japanese morphological analysis

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
SMT helps bitext dependency parsing

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Joint models for Chinese POS tagging and dependency parsing

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Integrating Generative and Discriminative Character-Based Models for Chinese Word Segmentation

ACM Transactions on Asian Language Information Processing (TALIP)
EXPLOITING SUBTREES IN AUTO-PARSED DATA TO IMPROVE DEPENDENCY PARSING

Computational Intelligence
Utilizing dependency language models for graph-based dependency parsing models

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Reducing approximation and estimation errors for Chinese lexical processing with heterogeneous annotations

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Incremental joint approach to word segmentation, POS tagging, and dependency parsing in Chinese

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Exploring deterministic constraints: from a constrained English POS tagger to an efficient ILP solution to Chinese word segmentation

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Iterative annotation transformation with predict-self reestimation for Chinese word segmentation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Joint Chinese word segmentation, POS tagging and parsing

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Unified dependency parsing of Chinese morphological and syntactic structures

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Automatic Korean word spacing using Pegasos algorithm

Information Processing and Management: an International Journal
Joint Optimization for Chinese POS Tagging and Dependency Parsing

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging. Our word-character hybrid model offers high performance since it can handle both known and unknown words. We describe our strategies that yield good balance for learning the characteristics of known and unknown words and propose an error-driven policy that delivers such balance by acquiring examples of unknown words from particular errors in a training corpus. We describe an efficient framework for training our model based on the Margin Infused Relaxed Algorithm (MIRA), evaluate our approach on the Penn Chinese Treebank, and show that it achieves superior performance compared to the state-of-the-art approaches reported in the literature.