Evaluation of two approaches for speaker specific speech recognition

Authors:
Tobias Herbig;Franz Gerl;Wolfgang Minker
Affiliations:
Nuance Communications Aachen GmbH, Ulm, Germany and University of Ulm, Institute of Information Technology, Ulm, Germany;Harman/Becker Automotive Systems GmbH, Ulm, Germany;University of Ulm, Institute of Information Technology, Ulm, Germany
Venue:
IWSDS'10 Proceedings of the Second international conference on Spoken dialogue systems for ambient environments
Year:
2010

Citing 3
Cited 2

Robust speech recognition in embedded system and PC applications

Robust speech recognition in embedded system and PC applications
Fast Adaptation of Speech and Speaker Characteristics for Enhanced Speech Recognition in Adverse Intelligent Environments

IE '10 Proceedings of the 2010 Sixth International Conference on Intelligent Environments
Detection of unknown speakers in an unsupervised speech controlled system

IWSDS'10 Proceedings of the Second international conference on Spoken dialogue systems for ambient environments

Detection of unknown speakers in an unsupervised speech controlled system

IWSDS'10 Proceedings of the Second international conference on Spoken dialogue systems for ambient environments
Self-learning speaker identification for enhanced speech recognition

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we examine two approaches for the automatic personalization of speech controlled systems. Speech recognition may be significantly improved by continuous speaker adaptation if the speaker can be reliably tracked. We evaluate two approaches for speaker identification suitable to identify 5-10 recurring users even in adverse environments. Only a very limited amount of speaker specific data can be used for training. A standard speaker identification approach is extended by speaker specific speech recognition. Multiple recognitions of speaker identity and spoken text are avoided to reduce latencies and computational complexity. In comparison, the speech recognizer itself is used to decode spoken phrases and to identify the current speaker in a single step. The latter approach is advantageous for applications which have to be performed on embedded devices, e.g. speech controlled navigation in automobiles. Both approaches were evaluated on a subset of the SPEECON database which represents realistic command and control scenarios for in-car applications.