Discriminative lexicon adaptation for improved character accuracy: a new direction in Chinese language modeling

Authors:
Yi-cheng Pan;Lin-shan Lee;Sadaoki Furui
Affiliations:
National Taiwan University, Taipei, Taiwan;National Taiwan University, Taipei, Taiwan;Tokyo Institute of Technology, Tokyo, Japan
Venue:
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Year:
2009

Citing 6
Cited 0

PAT-tree-based keyword extraction for Chinese information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Toward a unified approach to statistical language modeling for Chinese

ACM Transactions on Asian Language Information Processing (TALIP)
Chinese word segmentation based on maximum matching and word binding force

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Unknown word extraction for Chinese documents

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Tone-enhanced generalized character posterior probability (GCPP) for Cantonese LVCSR

Computer Speech and Language
Improved large vocabulary continuous chinese speech recognition by character-based consensus networks

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

While OOV is always a problem for most languages in ASR, in the Chinese case the problem can be avoided by utilizing character n-grams and moderate performances can be obtained. However, character n-gram has its own limitation and proper addition of new words can increase the ASR performance. Here we propose a discriminative lexicon adaptation approach for improved character accuracy, which not only adds new words but also deletes some words from the current lexicon. Different from other lexicon adaptation approaches, we consider the acoustic features and make our lexicon adaptation criterion consistent with that in the decoding process. The proposed approach not only improves the ASR character accuracy but also significantly enhances the performance of a character-based spoken document retrieval system.