Multi-party human-robot interaction with distant-talking speech recognition

Authors:
Randy Gomez;Tatsuya Kawahara;Keisuke Nakamura;Kazuhiro Nakadai
Affiliations:
Kyoto University, Kyoto, Japan;Kyoto University, Kyoto, Japan;Honda Research Institute, Wako, Japan;Honda Research Institute, Wako, Japan
Venue:
HRI '12 Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction
Year:
2012

Citing 2
Cited 0

Robust speech recognition based on dereverberation parameter optimization using acoustic model likelihood

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
De-noising by soft-thresholding

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speech is one of the most natural medium for human communication, which makes it vital to human-robot interaction. In real environments where robots are deployed, distant-talking speech recognition is difficult to realize due to the effects of reverberation. This leads to the degradation of speech recognition and understanding, and hinders a seamless human-robot interaction. To minimize this problem, traditional speech enhancement techniques optimized for human perception are adopted to achieve robustness in human-robot interaction. However, human and machine perceive speech differently: an improvement in speech recognition performance may not automatically translate to an improvement in human-robot interaction experience (as perceived by the users). In this paper, we propose a method in optimizing speech enhancement techniques specifically to improve automatic speech recognition (ASR) with emphasis on the human-robot interaction experience. Experimental results using real reverberant data in a multi-party conversation, show that the proposed method improved human-robot interaction experience in severe reverberant conditions compared to the traditional techniques.