Identifying the addressee in human-human-robot interactions based on head pose and speech

  • Authors:
  • Michael Katzenmaier;Rainer Stiefelhagen;Tanja Schultz

  • Affiliations:
  • Universität Karlsruhe (TH), Karlsruhe, Germany;Universität Karlsruhe (TH), Karlsruhe, Germany;Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • Proceedings of the 6th international conference on Multimodal interfaces
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work we investigate the power of acoustic and visual cues, and their combination, to identify the addressee in a human-human-robot interaction. Based on eighteen audio-visual recordings of two human beings and a (simulated) robot we discriminate the interaction of the two humans from the interaction of one human with the robot. The paper compares the result of three approaches. The first approach uses purely acoustic cues to find the addressees. Low level, feature based cues as well as higher-level cues are examined. In the second approach we test whether the human's head pose is a suitable cue. Our results show that visually estimated head pose is a more reliable cue for the identification of the addressee in the human-human-robot interaction. In the third approach we combine the acoustic and visual cues which results in significant improvements.