Revising word lattice using support vector machine for Chinese word segmentation

  • Authors:
  • Ming Zhong;Sheng Wang;Ming Wu

  • Affiliations:
  • Wuhan University, Wuhan, China;Wuhan University, Wuhan, China;Zhongnan University of Economics and Law, Wuhan, China

  • Venue:
  • Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a novel Chinese word segmentation approach combining both dictionary-based and statistics-based techniques. First, we transform a linear sentence to a word lattice based on dictionary. Then we apply classification method based on support vector machine to conduct two main tasks: resolving segmentation ambiguities and recognizing out-of-vocabulary words. We determine the position in word of the current character by using some of its surrounding characters as features. Disambiguation and recognition result in pruning and appending edges in the word lattice. Lastly, we output the segmentation results by searching the shortest path in the word lattice. Our experimental results show that our approach can achieve an F-score of 92.8% in PKU closed test of the second SIGHAN bakeoff.