Fast Adaptation of Speech and Speaker Characteristics for Enhanced Speech Recognition in Adverse Intelligent Environments

Authors:
Tobias Herbig;Franz Gerl;Wolfgang Minker
Affiliations:
-;-;-
Venue:
IE '10 Proceedings of the 2010 Sixth International Conference on Intelligent Environments
Year:
2010

Citing 0
Cited 3

Detection of unknown speakers in an unsupervised speech controlled system

IWSDS'10 Proceedings of the Second international conference on Spoken dialogue systems for ambient environments
Evaluation of two approaches for speaker specific speech recognition

IWSDS'10 Proceedings of the Second international conference on Spoken dialogue systems for ambient environments
Self-learning speaker identification for enhanced speech recognition

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a technique for fast adaptation of speech and speaker related information. Fast learning is particularly useful for automatic personalization of speech-controlled devices. Such a personalization of human-computer interfaces to be used in intelligent environments represents an important research issue. Speech recognition is enhanced by speaker specific profiles which are continuously adapted. A fast but robust tracking of speaker characteristics and optimal long-term adaptation are investigated to avoid an extensive enrollment of new speakers. We present an implementation suitable for speaker specific speech recognition in adverse intelligent environments. Exemplarily, in-car applications such as speech controlled navigation, hands-free telephony or infotainment systems are investigated for embedded systems. Results for a subset of the SPEECON database are presented. They validate the benefit of the presented speaker adaptation scheme for speech recognition. Speaker characteristics are captured after very few utterances. In the long run speaker characteristics are accurately represented. This adaptation scheme might be used to develop an unsupervised speech controlled system comprising speech recognition and speaker identification. A unified modeling of speech and speaker characteristics is proposed.