Investigations to minimum phone error training in bilingual speech recognition

Authors:
Ran Xu;Qingqing Zhang;Jielin Pan;Yonghong Yan
Affiliations:
ThinkIT Speech Laboratory, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China;ThinkIT Speech Laboratory, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China;ThinkIT Speech Laboratory, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China;ThinkIT Speech Laboratory, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China
Venue:
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 4
Year:
2009

Citing 3
Cited 0

Towards Universal Speech Recognition

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Discriminative semi-parametric trajectory model for speech recognition

Computer Speech and Language
Development of a Mandarin-English Bilingual Speech Recognition System for Real World Music Retrieval

IEICE - Transactions on Information and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The great success of Minimum Phone Error (MPE) training criterion in mono-language large vocabulary continuous speech recognition (LVCSR) tasks motivates us to apply it to bilingual LVCSR systems. In this paper, in conjunction with the previous respectable bilingual phoneme inventory construction techniques, we give a comprehensive investigation to the performance of MPE/fMPE on various Mandarin-English bilingual test sets under different test conditions. The evaluation results show that the final fMPE+MPE model achieves significant improvements compared to the baseline models. On the mono-language test sets, the best improvement is a relative error rate reduction of 28.4%. And on the code-mixing test set, it also achieves a relative error rate reduction of 8.1 %. The within- and cross-language substitution error rate introduced in this paper also explicitly shows that fMPE/MPE training can effectively improve the model's within- and cross-language discriminabilities in our bilingual recognition tasks.