A chinese sentence segmentation approach based on comma

  • Authors:
  • Shengqin Xu;Fang Kong;Peifeng Li;Qiaoming Zhu

  • Affiliations:
  • Natural Language Processing Lab, Soochow University, Suzhou, Jiangsu, China, School of Computer Science & Technology, Soochow University, Suzhou, Jiangsu, China;Natural Language Processing Lab, Soochow University, Suzhou, Jiangsu, China, School of Computer Science & Technology, Soochow University, Suzhou, Jiangsu, China;Natural Language Processing Lab, Soochow University, Suzhou, Jiangsu, China, School of Computer Science & Technology, Soochow University, Suzhou, Jiangsu, China;Natural Language Processing Lab, Soochow University, Suzhou, Jiangsu, China, School of Computer Science & Technology, Soochow University, Suzhou, Jiangsu, China

  • Venue:
  • CLSW'12 Proceedings of the 13th Chinese conference on Chinese Lexical Semantics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Chinese sentence segmentation is considered to be a very fundamental step in natural language processing. A successful solution for sentence boundary detection is a key step in the subsequent NLP tasks, such as parsing and machine translation, etc. In this paper, we consider comma as a sign-of-the-sentence boundary, and then divide it into two major types, i.e., the true (EOS) and the pseudo (Non-EOS). Finally, a system framework of Chinese sentence segmentation based on two-layer classifiers is presented and implemented. The experimental results on Chinese Treebank 6.0. Results show that our model achieve the F-measure of 90.7% overall, which improves by 1.5%.