A stochastic finite-state word-segmentation algorithm for Chinese

  • Authors:
  • Richard Sproat;Chilin Shih;William Gale;Nancy Chang

  • Affiliations:
  • AT&T Bell Laboratories, Murray Hill, NJ;AT&T Bell Laboratories, Murray Hill, NJ;AT&T Bell Laboratories, Murray Hill, NJ;Harvard University, Cambridge, MA

  • Venue:
  • ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a stochastic finite-state model for segmenting Chinese text into dictionary entries and productively derived words, and providing pronunciations for these words; the method incorporates a class-based model in its treatment of personal names. We also evaluate the system's performance, taking into account the fact that people often do not agree on a single segmentation.