Real-time speaker adaptation for speech recognition on mobile devices

Authors:
Gil Ho Lee
Affiliations:
Computer Science Lab., Samsung Advanced Institute of Technology, Samsung Electronics, Yongin, Korea
Venue:
CCNC'10 Proceedings of the 7th IEEE conference on Consumer communications and networking conference
Year:
2010

Citing 0
Cited 1

Emerging Input Technologies for Always-Available Mobile Interaction

Foundations and Trends in Human-Computer Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduce a real-time speaker adaptation method for speech recognition on mobile devices. In order to adapt speech recognition system to any speakers, we employ vocal tract length normalization (VTLN). In conventional VTLN, warping factors are computed by maximum likelihood estimation. After all possible warping factors are applied to speech recognition, the best warping factor is selected corresponding to speaker or speech. Therefore it is not efficient for mobile devices because of expensive computation although its performance is good. To reduce computational effort, we employ pitch-based VTLN and simplify pitch estimation. The proposed method gives the relative word error rate reduction by 21.5% in Korean while the speed is slower by 10.5% as compared to the baseline.