A stacked sub-word model for joint Chinese word segmentation and part-of-speech tagging

Authors:
Weiwei Sun
Affiliations:
Saarland University, Saarbrücken, Germany
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Year:
2011

Citing 14
Cited 10

Original Contribution: Stacked generalization

Neural Networks
Stacked regressions

Machine Learning
The Penn Chinese TreeBank: Phrase structure annotation of a large corpus

Natural Language Engineering
Chunking with support vector machines

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
A stacked, voted, stacked model for named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Chinese word segmentation as LMR tagging

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
Word lattice reranking for Chinese word segmentation and part-of-speech tagging

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Stacking dependency parsers

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Subword-based tagging by conditional random fields for Chinese word segmentation

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Stacked sequential learning

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
An error-driven word-character hybrid model for joint Chinese word segmentation and POS tagging

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
A fast decoder for joint word segmentation and POS-tagging using a single discriminative model

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Word-based and character-based word segmentation models: comparison and combination

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters

Enhancing Chinese word segmentation using unlabeled data

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Integrating Generative and Discriminative Character-Based Models for Chinese Word Segmentation

ACM Transactions on Asian Language Information Processing (TALIP)
Reducing approximation and estimation errors for Chinese lexical processing with heterogeneous annotations

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Incremental joint approach to word segmentation, POS tagging, and dependency parsing in Chinese

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Exploring deterministic constraints: from a constrained English POS tagger to an efficient ILP solution to Chinese word segmentation

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Iterative annotation transformation with predict-self reestimation for Chinese word segmentation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Joint Chinese word segmentation, POS tagging and parsing

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Part-of-speech tagging for Chinese-English mixed texts with dynamic features

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Unified dependency parsing of Chinese morphological and syntactic structures

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
The Left and Right Context of a Word: Overlapping Chinese Syllable Word Segmentation with Minimal Context

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.03

Visualization

Abstract

The large combined search space of joint word segmentation and Part-of-Speech (POS) tagging makes efficient decoding very hard. As a result, effective high order features representing rich contexts are inconvenient to use. In this work, we propose a novel stacked subword model for this task, concerning both efficiency and effectiveness. Our solution is a two step process. First, one word-based segmenter, one character-based segmenter and one local character classifier are trained to produce coarse segmentation and POS information. Second, the outputs of the three predictors are merged into sub-word sequences, which are further bracketed and labeled with POS tags by a fine-grained sub-word tagger. The coarse-to-fine search scheme is efficient, while in the sub-word tagging step rich contextual features can be approximately derived. Evaluation on the Penn Chinese Tree-bank shows that our model yields improvements over the best system reported in the literature.