PAT-tree-based keyword extraction for Chinese information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Toward a unified approach to statistical language modeling for Chinese
ACM Transactions on Asian Language Information Processing (TALIP)
Chinese word segmentation based on maximum matching and word binding force
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Unknown word extraction for Chinese documents
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Tone-enhanced generalized character posterior probability (GCPP) for Cantonese LVCSR
Computer Speech and Language
ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Hi-index | 0.01 |
While OOV is always a problem for most languages in ASR, in the Chinese case the problem can be avoided by utilizing character n-grams and moderate performances can be obtained. However, character n-gram has its own limitation and proper addition of new words can increase the ASR performance. Here we propose a discriminative lexicon adaptation approach for improved character accuracy, which not only adds new words but also deletes some words from the current lexicon. Different from other lexicon adaptation approaches, we consider the acoustic features and make our lexicon adaptation criterion consistent with that in the decoding process. The proposed approach not only improves the ASR character accuracy but also significantly enhances the performance of a character-based spoken document retrieval system.