Punctuation as implicit annotations for chinese word segmentation

Authors:
Zhongguo Li;Maosong Sun
Affiliations:
-;-
Venue:
Computational Linguistics
Year:
2009

Citing 11
Cited 5

A maximum entropy approach to natural language processing

Computational Linguistics
Self-Supervised Chinese Word Segmentation

IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
A maximum entropy approach to named entity recognition

A maximum entropy approach to named entity recognition
Chinese word segmentation without using lexicon and hand-crafted training data

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Accessor variety criteria for Chinese word extraction

Computational Linguistics
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Some applications of tree-based modelling to speech and language

HLT '89 Proceedings of the workshop on Speech and Natural Language
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach

Computational Linguistics
Contextual dependencies in unsupervised word segmentation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Chinese segmentation and new word detection using conditional random fields

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Unsupervised segmentation of Chinese text by use of branching entropy

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions

Parsing the internal structure of words: a new paradigm for Chinese word segmentation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Punctuation: making a point in unsupervised dependency parsing

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Non-parametric bayesian segmentation of Japanese noun phrases

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Enhancing Chinese word segmentation using unlabeled data

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Integrating Generative and Discriminative Character-Based Models for Chinese Word Segmentation

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a Chinese word segmentation model learned from punctuation marks which are perfect word delimiters. The learning is aided by a manually segmented corpus. Our method is considerably more effective than previous methods in unknown word recognition. This is a step toward addressing one of the toughest problems in Chinese word segmentation.