An integrated approach to chinese word segmentation and part-of-speech tagging

  • Authors:
  • Maosong Sun;Dongliang Xu;Benjamin K. Tsou;Huaming Lu

  • Affiliations:
  • National Lab. of Intelligent Tech. & Systems, Tsinghua University, Beijing, China;National Lab. of Intelligent Tech. & Systems, Tsinghua University, Beijing, China;Language Information Sciences Research Centre, City University of Hong Kong;Beijing Information Science and Technology University, Beijing, China

  • Venue:
  • ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper discusses and compares various integration schemes of Chinese word segmentation and part-of-speech tagging in the framework of true-integration and pseudo-integration. A true-integration approach, named ‘the divide-and-conquer integration', is presented. The experiments based on a manually word-segmented and part-of-speech tagged corpus with about 5.8 million words show that this true integration achieves 98.61% F-measure in word segmentation, 95.18% F-measure in part-of-speech tagging, and 93.86% F-measure in word segmentation and part-of-speech tagging, outperforming all other kinds of combinations to some extent. The experimental results demonstrate the potential for further improving the performance of Chinese word segmentation and part-of-speech tagging.