Improving identification accuracy by extending acceptable utterances in spoken dialogue system using barge-in timing

Authors:
Kyoko Matsuyama;Kazunori Komatani;Toru Takahashi;Tetsuya Ogata;Hiroshi G. Okuno
Affiliations:
Graduate School of Informatics, Kyoto University, Kyoto, Japan;Graduate School of Informatics, Kyoto University, Kyoto, Japan;Graduate School of Informatics, Kyoto University, Kyoto, Japan;Graduate School of Informatics, Kyoto University, Kyoto, Japan;Graduate School of Informatics, Kyoto University, Kyoto, Japan
Venue:
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
Year:
2010

Citing 4
Cited 0

Automatic text processing

Automatic text processing
Spoken dialogue technology: enabling the conversational user interface

ACM Computing Surveys (CSUR)
Perceptual dominance time distributions in multistable visual perception

Biological Cybernetics
CIAIR In-Car Speech Corpus---Influence of Driving Status---

IEICE - Transactions on Information and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a novel dialogue strategy enabling robust interaction under noisy environments where automatic speech recognition (ASR) results are not necessarily reliable. We have developed a method that exploits utterance timing together with ASR results to interpret user intention, that is, to identify one item that a user wants to indicate from system enumeration. The timing of utterances containing referential expressions is approximated by Gamma distribution, which is integrated with ASR results by expressing both of them as probabilities. In this paper, we improve the identification accuracy by extending the method. First, we enable interpretation of utterances including ordinal numbers, which appear several times in our data collected from users. Then we use proper acoustic models and parameters, improving the identification accuracy by 4.0% in total. We also show that Latent Semantic Mapping (LSM) enables more expressions to be handled in our framework.