A chinese sentence segmentation approach based on comma

Authors:
Shengqin Xu;Fang Kong;Peifeng Li;Qiaoming Zhu
Affiliations:
Natural Language Processing Lab, Soochow University, Suzhou, Jiangsu, China, School of Computer Science & Technology, Soochow University, Suzhou, Jiangsu, China;Natural Language Processing Lab, Soochow University, Suzhou, Jiangsu, China, School of Computer Science & Technology, Soochow University, Suzhou, Jiangsu, China;Natural Language Processing Lab, Soochow University, Suzhou, Jiangsu, China, School of Computer Science & Technology, Soochow University, Suzhou, Jiangsu, China;Natural Language Processing Lab, Soochow University, Suzhou, Jiangsu, China, School of Computer Science & Technology, Soochow University, Suzhou, Jiangsu, China
Venue:
CLSW'12 Proceedings of the 13th Chinese conference on Chinese Lexical Semantics
Year:
2012

Citing 9
Cited 0

Adaptive sentence boundary disambiguation

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A maximum entropy approach to identifying sentence boundaries

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
MITRE: description of the Alembic system used for MUC-6

MUC6 '95 Proceedings of the 6th conference on Message understanding
Discriminative hidden Markov modeling with long state dependence using a kNN ensemble

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Direct modelling of output context dependence in discriminative hidden Markov model

Pattern Recognition Letters
Syntactic parsing with hierarchical modeling

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
A unified framework for scope learning via simplified shallow semantic parsing

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Learning the scope of negation via shallow semantic parsing

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Chinese sentence segmentation as comma classification

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Chinese sentence segmentation is considered to be a very fundamental step in natural language processing. A successful solution for sentence boundary detection is a key step in the subsequent NLP tasks, such as parsing and machine translation, etc. In this paper, we consider comma as a sign-of-the-sentence boundary, and then divide it into two major types, i.e., the true (EOS) and the pseudo (Non-EOS). Finally, a system framework of Chinese sentence segmentation based on two-layer classifiers is presented and implemented. The experimental results on Chinese Treebank 6.0. Results show that our model achieve the F-measure of 90.7% overall, which improves by 1.5%.