Genericity and portability for task-independent speech recognition

Authors:
Fabrice Lefevre;Jean-Luc Gauvain;Lori Lamel
Affiliations:
Spoken Language Processing Group, LIMSI-CNRS, BP 133, 91403 Orsay Cedex, France;Spoken Language Processing Group, LIMSI-CNRS, BP 133, 91403 Orsay Cedex, France;Spoken Language Processing Group, LIMSI-CNRS, BP 133, 91403 Orsay Cedex, France
Venue:
Computer Speech and Language
Year:
2005

Citing 5
Cited 0

Evaluation of spoken language systems: the ATIS domain

HLT '90 Proceedings of the workshop on Speech and Natural Language
Speaker-independent continuous speech dictation

Speech Communication
Language-independent and language-adaptive acoustic modeling for speech recognition

Speech Communication
Cross-task portability of a broadcast news speech recognition system

Speech Communication
MAP estimation of continuous density HMM: theory and applications

HLT '91 Proceedings of the workshop on Speech and Natural Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

As core speech recognition technology improves, opening up a wider range of applications, genericity and portability are becoming important issues. Most of todays recognition systems are still tuned to a particular task and porting the system to a new task (or language) requires a substantial investment of time and money, as well as human expertise. This paper addresses issues in speech recognizer portability and in the development of generic core speech recognition technology. First, the genericity of wide domain models is assessed by evaluating their performance on several tasks of varied complexity. Then, techniques aimed at enhancing the genericity of these wide domain models are investigated. Multi-source acoustic training is shown to reduce the performance gap between task-independent and task-dependent acoustic models, and for some tasks to out-perform task-dependent acoustic models. Transparent methods for porting generic models to a specific task are also explored. Transparent unsupervised acoustic model adaptation is contrasted with supervised adaptation, and incremental unsupervised adaptation of both the acoustic and linguistic models is investigated. Experimental results on a dialog task show that with the proposed scheme, a transparently adapted generic system can perform nearly as well (about a 1% absolute gap in word error rate) as a task-specific system trained on several tens of hours of manually transcribed data.