An investigation of single-pass ASR system combination for spoken language understanding

Authors:
Fethi Bougares;Mickael Rouvier;Nathalie Camelin;Paul Deléglise;Yannick Estève
Affiliations:
LIUM - University of Le Mans, France;LIUM - University of Le Mans, France;LIUM - University of Le Mans, France;LIUM - University of Le Mans, France;LIUM - University of Le Mans, France
Venue:
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Year:
2013

Citing 2
Cited 0

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper studies the benefits provided by a single-pass Automatic Speech Recognition (ASR) exchange-based combination approach for spoken dialog system. Three famous open-source ASR systems are used to experiment this approach in the framework of Spoken Language Understanding (SLU). On the ASR side, single-pass ASR systems are used with an online acoustic model adaptation using the previous utterances said by a speaker. On the SLU side, a competitive CRF-based SLU system is applied on outputs of ASR system to obtain the semantic concepts. The evaluation is done on the French PORT-MEDIA test data in terms of both Word Error Rate (WER) and Concept Error Rate (CER). While the best single pass system used alone shows a CER of 29.8% for a WER of 22.8%, single-pass ASR exchange-based combination reaches a CER of 27.3% for a WER of 26%. This CER is only slightly higher than the one reached by a 5-passes ASR system which obtained a CER of 26.8% for a WER of 22.8% in better conditions, i.e. better acoustic model adaptation made on all the speech utterances said by a speaker, advanced feature extraction techniques and search graph rescoring using language model with higher order.