Jurilinguistic engineering in Cantonese Chinese: an N-gram-based speech to text transcription system

Authors:
B. K. T'sou;K. K. Sin;S. W. K. Chan;T. B. Y. Lai;C. Lun;K. T. Ko;G. K. K. Chan;L. Y. L. Cheung
Affiliations:
City University of Hong Kong, Kowloon, Hong Kong SAR, China;City University of Hong Kong, Kowloon, Hong Kong SAR, China;City University of Hong Kong, Kowloon, Hong Kong SAR, China;City University of Hong Kong, Kowloon, Hong Kong SAR, China;City University of Hong Kong, Kowloon, Hong Kong SAR, China;City University of Hong Kong, Kowloon, Hong Kong SAR, China;City University of Hong Kong, Kowloon, Hong Kong SAR, China;City University of Hong Kong, Kowloon, Hong Kong SAR, China
Venue:
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Year:
2000

Citing 1
Cited 0

Self-organized language modeling for speech recognition

Readings in speech recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

A Cantonese Chinese transcription system to automatically convert stenograph code to Chinese characters is reported. The major challenge in developing such a system is the critical homocode problem because of homonymy. The statistical N-gram model is used to compute the best combination of characters. Supplemented with a 0.85 million character corpus of domain-specific training data and enhancement measures, the bigram and trigram implementations achieve 95% and 96% accuracy respectively, as compared with 78% accuracy in the baseline model. The system performance is comparable with other advanced Chinese Speech-to-Text input applications under development. The system meets an urgent need of the Judiciary of post-1997 Hong Kong.