Exploring deterministic constraints: from a constrained English POS tagger to an efficient ILP solution to Chinese word segmentation

Authors:
Qiuye Zhao;Mitch Marcus
Affiliations:
University of Pennsylvania;University of Pennsylvania
Venue:
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Year:
2012

Citing 12
Cited 0

A stochastic finite-state word-segmentation algorithm for Chinese

Computational Linguistics
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Integer linear programming inference for conditional random fields

ICML '05 Proceedings of the 22nd international conference on Machine learning
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Interactive information extraction with constrained conditional random fields

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Word lattice reranking for Chinese word segmentation and part-of-speech tagging

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Concise integer linear programming formulations for dependency parsing

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Minimized models for unsupervised part-of-speech tagging

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
An error-driven word-character hybrid model for joint Chinese word segmentation and POS tagging

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Web-scale features for full-scale parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A stacked sub-word model for joint Chinese word segmentation and part-of-speech tagging

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

We show for both English POS tagging and Chinese word segmentation that with proper representation, large number of deterministic constraints can be learned from training examples, and these are useful in constraining probabilistic inference. For tagging, learned constraints are directly used to constrain Viterbi decoding. For segmentation, character-based tagging constraints can be learned with the same templates. However, they are better applied to a word-based model, thus an integer linear programming (ILP) formulation is proposed. For both problems, the corresponding constrained solutions have advantages in both efficiency and accuracy.