Self-organized language modeling for speech recognition
Readings in speech recognition
Hi-index | 0.00 |
A Cantonese Chinese transcription system to automatically convert stenograph code to Chinese characters is reported. The major challenge in developing such a system is the critical homocode problem because of homonymy. The statistical N-gram model is used to compute the best combination of characters. Supplemented with a 0.85 million character corpus of domain-specific training data and enhancement measures, the bigram and trigram implementations achieve 95% and 96% accuracy respectively, as compared with 78% accuracy in the baseline model. The system performance is comparable with other advanced Chinese Speech-to-Text input applications under development. The system meets an urgent need of the Judiciary of post-1997 Hong Kong.