CSeg& Tag1.0: a practical word segmenter and POS tagger for Chinese texts

  • Authors:
  • Sun Maosong;Shen Dayang;Huang Changning

  • Affiliations:
  • Tsinghua University, Beijing, P.R. China;Tsinghua University, Beijing, P.R. China;Tsinghua University, Beijing, P.R. China

  • Venue:
  • ANLC '97 Proceedings of the fifth conference on Applied natural language processing
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

Chinese word segmentation and POS tagging are two key techniques in many applications in Chinese information processing. Great efforts have been paid to the research in the last decade, but unfortunately, no practical system with high performance for unrestricted texts is available up to date. CSeg&Tag1.0, a Chinese word segmenter and POS tagger which unifies these two procedures into one model, is introduced in this paper. The preliminary open tests show that the segmentation precision of CSeg&Tag1.0 is about 98.0% - 99.3%, POS tagging precision about 91.0% - 97.1%, and the recall and precision for unknown words are ranging from 95.0% to 99.0% and from 87.6% to 95.3% respectively. The processing speed is about 100 characters per second on Pentium 133 PC. The work of improving the performance of the system is still ongoing.