Automatic speech recognition and speech variability: A review
Speech Communication
Business Intelligence from Voice of Customer
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Unsupervised speaker adaptation for telephone call transcription
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
An audio indexing system for election video material
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Hi-index | 0.00 |
This paper proposes a fast unsupervised acoustic model adaptation technique with efficient statistics accumulation for speech recognition. Conventional adaptation techniques accumulate the acoustic statistics based on a forward-backward algorithm or a Viterbi algorithm. Since both algorithms require a state sequence prior to statistic accumulation, the conventional techniques need time to determine the state sequence by transcribing the target speech in advance. Instead of pre-determining the state sequence, the proposed technique reduces the computation time by accumulating the statistics with state confidence within monophone per frame. It also rapidly selects the appropriate gender acoustic model before adaptation, and further increases the accuracy by employing a power term after adaptation. Recognition experiments using spontaneous speech show that the proposed technique reduces computation time by 57.3% while providing the same accuracy as the conventional adaptation technique.