Chinese word segmentation as LMR tagging

Authors:
Nianwen Xue;Libin Shen
Affiliations:
University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA
Venue:
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Year:
2003

Citing 8
Cited 18

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
A stochastic finite-state word-segmentation algorithm for Chinese

Computational Linguistics
A statistically emergent approach for language processing: application to modeling context effects in ambiguous Chinese word boundary perception

Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Improving data driven wordclass tagging by system combination

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Transformation-based learning in the fast lane

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
A SNoW based supertagger with application to NP chunking

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Chinese word segmentation as LMR tagging

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17

Chinese word segmentation as LMR tagging

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Subword-based tagging for confidence-dependent Chinese word segmentation

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Chinese word segmentation and statistical machine translation

ACM Transactions on Speech and Language Processing (TSLP)
Integration of Named Entity Information for Chinese Word Segmentation Based on Maximum Entropy

ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Theoretical and Methodological Issues
Combining Language Modeling and Discriminative Classification for Word Segmentation

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Word lattice reranking for Chinese word segmentation and part-of-speech tagging

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Subword-based tagging by conditional random fields for Chinese word segmentation

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
A dual-layer CRFs based joint decoding method for cascaded segmentation and labeling tasks

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Automatic adaptation of annotation standards: Chinese word segmentation and POS tagging: a case study

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Chinese text segmentation: A hybrid approach using transductive learning and statistical association measures

Expert Systems with Applications: An International Journal
A Unified Character-Based Tagging Framework for Chinese Word Segmentation

ACM Transactions on Asian Language Information Processing (TALIP)
Joint tokenization and translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
A stacked sub-word model for joint Chinese word segmentation and part-of-speech tagging

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A lexicon-constrained character model for chinese morphological analysis

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Integrating Generative and Discriminative Character-Based Models for Chinese Word Segmentation

ACM Transactions on Asian Language Information Processing (TALIP)
Iterative annotation transformation with predict-self reestimation for Chinese word segmentation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
The Left and Right Context of a Word: Overlapping Chinese Syllable Word Segmentation with Minimal Context

ACM Transactions on Asian Language Information Processing (TALIP)
An empirical study on word segmentation for chinese machine translation

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present Chinese word segmentation algorithms based on the so-called LMR tagging. Our LMR taggers are implemented with the Maximum Entropy Markov Model and we then use Transformation-Based Learning to combine the results of the two LMR taggers that scan the input in opposite directions. Our system achieves F-scores of 95.9% and 91.6% on the Academia Sinica corpus and the Hong Kong City University corpus respectively.