Automatic recognition of Chinese unknown words based on roles tagging

  • Authors:
  • Kevin Zhang;Qun Liu;Hao Zhang;Xue-Qi Cheng

  • Affiliations:
  • Institute of Computing Technology, Beijing, P. R. China;Institute of Computing Technology, Beijing, P. R. China;Institute of Computing Technology, Beijing, P. R. China;Institute of Computing Technology, Beijing, P. R. China

  • Venue:
  • SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a unified solution, which is based on the idea of "roles tagging", to the complicated problems of Chinese unknown words recognition. In our approach, an unknown word is identified according to its component tokens and context tokens. In order to capture the functions of tokens, we use the concept of roles. Roles are tagged through applying the Viterbi algorithm in the fashion of a POS tagger. In the resulted most probable roles sequence, all the eligible unknown words are recognized through a maximum patterns matching. We have got excellent precision and recalling rates, especially for person names and transliterations. The result and experiments in our system ICTCLAS shows that our approach based on roles tagging is simple yet effective.