Statistically-enhanced new word identification in a rule-based Chinese system

  • Authors:
  • Andi Wu;Zixin Jiang

  • Affiliations:
  • Microsoft Research, Redmond, WA;Microsoft Research, Redmond, WA

  • Venue:
  • CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a mechanism of new word identification in Chinese text where probabilities are used to filter candidate character strings and to assign POS to the selected strings in a ruled-based system. This mechanism avoids the sparse data problem of pure statistical approaches and the over-generation problem of rule-based approaches. It improves parser coverage and provides a tool for the lexical acquisition of new words.