Presentation sensei: a presentation training system using speech and image processing

  • Authors:
  • Kazutaka Kurihara;Masataka Goto;Jun Ogata;Yosuke Matsusaka;Takeo Igarashi

  • Affiliations:
  • National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan;National Institute of Advanced Industrial Science and Technology (AIST), Ibaraki, Japan;National Institute of Advanced Industrial Science and Technology (AIST), Ibaraki, Japan;National Institute of Advanced Industrial Science and Technology (AIST), Ibaraki, Japan;The University of Tokyo, Tokyo, Japan

  • Venue:
  • Proceedings of the 9th international conference on Multimodal interfaces
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present a presentation training system that observes a presentation rehearsal and provides the speaker with recommendations for improving the delivery of the presentation, such as to speak more slowly and to look at the audience. Our system "Presentation Sensei" is equipped with a microphone and camera to analyze a presentation by combining speech and image processing techniques. Based on the results of the analysis, the system gives the speaker instant feedback with respect to the speaking rate, eye contact with the audience, and timing. It also alerts the speaker when some of these indices exceed predefined warning thresholds. After the presentation, the system generates visual summaries of the analysis results for the speaker's self-examinations. Our goal is not to improve the content on a semantic level, but to improve the delivery of it by reducing inappropriate basic behavior patterns. We asked a few test users to try the system and they found it very useful for improving their presentations. We also compared the system's output with the observations of a human evaluator. The result shows that the system successfully detected some inappropriate behavior. The contribution of this work is to introduce a practical recognition-based human training system and to show its feasibility despite the limitations of state-of-the-art speech and video recognition technologies.