Jurilinguistic engineering in Cantonese Chinese: an N-gram-based speech to text transcription system

  • Authors:
  • B. K. T'sou;K. K. Sin;S. W. K. Chan;T. B. Y. Lai;C. Lun;K. T. Ko;G. K. K. Chan;L. Y. L. Cheung

  • Affiliations:
  • City University of Hong Kong, Kowloon, Hong Kong SAR, China;City University of Hong Kong, Kowloon, Hong Kong SAR, China;City University of Hong Kong, Kowloon, Hong Kong SAR, China;City University of Hong Kong, Kowloon, Hong Kong SAR, China;City University of Hong Kong, Kowloon, Hong Kong SAR, China;City University of Hong Kong, Kowloon, Hong Kong SAR, China;City University of Hong Kong, Kowloon, Hong Kong SAR, China;City University of Hong Kong, Kowloon, Hong Kong SAR, China

  • Venue:
  • COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

A Cantonese Chinese transcription system to automatically convert stenograph code to Chinese characters is reported. The major challenge in developing such a system is the critical homocode problem because of homonymy. The statistical N-gram model is used to compute the best combination of characters. Supplemented with a 0.85 million character corpus of domain-specific training data and enhancement measures, the bigram and trigram implementations achieve 95% and 96% accuracy respectively, as compared with 78% accuracy in the baseline model. The system performance is comparable with other advanced Chinese Speech-to-Text input applications under development. The system meets an urgent need of the Judiciary of post-1997 Hong Kong.