Self-learning speaker identification for enhanced speech recognition

Authors:
Tobias Herbig;Franz Gerl;Wolfgang Minker
Affiliations:
University of Ulm, Institute of Information Technology, Ulm, Germany and Nuance Communications Aachen GmbH, Ulm, Germany;SVOX Deutschland GmbH, Ulm, Germany;University of Ulm, Institute of Information Technology, Ulm, Germany
Venue:
Computer Speech and Language
Year:
2012

Citing 9
Cited 1

Fundamentals of speech recognition

Fundamentals of speech recognition
The influence of acoustics on speech production: a noise-induced stress phenomenon known as the Lombard reflex

Speech Communication - Special issue on speech under stress
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Robust speech recognition in embedded system and PC applications

Robust speech recognition in embedded system and PC applications
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Fast Adaptation of Speech and Speaker Characteristics for Enhanced Speech Recognition in Adverse Intelligent Environments

IE '10 Proceedings of the 2010 Sixth International Conference on Intelligent Environments
Detection of unknown speakers in an unsupervised speech controlled system

IWSDS'10 Proceedings of the Second international conference on Spoken dialogue systems for ambient environments
Evaluation of two approaches for speaker specific speech recognition

IWSDS'10 Proceedings of the Second international conference on Spoken dialogue systems for ambient environments
Discriminative In-Set/Out-of-Set Speaker Recognition

IEEE Transactions on Audio, Speech, and Language Processing

A novel speech content authentication algorithm based on Bessel-Fourier moments

Digital Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A novel approach for joint speaker identification and speech recognition is presented in this article. Unsupervised speaker tracking and automatic adaptation of the human-computer interface is achieved by the interaction of speaker identification, speech recognition and speaker adaptation for a limited number of recurring users. Together with a technique for efficient information retrieval a compact modeling of speech and speaker characteristics is presented. Applying speaker specific profiles allows speech recognition to take individual speech characteristics into consideration to achieve higher recognition rates. Speaker profiles are initialized and continuously adapted by a balanced strategy of short-term and long-term speaker adaptation combined with robust speaker identification. Different users can be tracked by the resulting self-learning speech controlled system. Only a very short enrollment of each speaker is required. Subsequent utterances are used for unsupervised adaptation resulting in continuously improved speech recognition rates. Additionally, the detection of unknown speakers is examined under the objective to avoid the requirement to train new speaker profiles explicitly. The speech controlled system presented here is suitable for in-car applications, e.g. speech controlled navigation, hands-free telephony or infotainment systems, on embedded devices. Results are presented for a subset of the SPEECON database. The results validate the benefit of the speaker adaptation scheme and the unified modeling in terms of speaker identification and speech recognition rates.