An integrative Chinese lexical analyzer based on maximum matching and second-maximum matching segmentation

  • Authors:
  • Xiao Sun;Degen Huang

  • Affiliations:
  • Department of Computer Science and Engineering, Dalian University of Technology, DaLian, LiaoNing, P.R. China;Department of Computer Science and Engineering, Dalian University of Technology, DaLian, LiaoNing, P.R. China

  • Venue:
  • ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper demonstrates an integrative lexical analysis mechanism to solve the limitation of the existing lexical analysis systems with "pipelining". The integrative lexical analysis mechanism extends the Maximum Matching and Second-Maximum Matching model, POS (part of speech) and all the candidate words is included in the directed graph. The Dijkstra algorithm is applied to find the minimum cost path in the directed graph. With the integrative model, word segmentation, POS tagging and unknown words recognition are accomplished synchronously, the conflicts of all tasks of lexical analysis are avoided, high precision can be gained. The open test indicates the precision of the system is 98.65% and recall is 98.96%.