Audio-visual analysis for event understanding

  • Authors:
  • Zhengyou Zhang

  • Affiliations:
  • Microsoft Research, Redmond, WA, USA

  • Venue:
  • AMC '09 Proceedings of the 2009 workshop on Ambient media computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Endowing machine with human-like seeing and hearing capabilities has long been an endeavor for generations of scientists. Although seeing and hearing are granted to a human, doing so by a computer is a formidable challenge. In this talk, I will focus on event understanding using audio and visual information either separately or jointly. In particular, I will describe the following pieces of work conducted jointly with my collaborators: presence detection ("where", "what", "when", and "who") in an office environment using multi-sensory inputs from streams of video, audio and computer interactions (mouse and keyboard information); active speaker detection in a meeting room using microphone arrays and camera arrays; human action recognition with expandable graphical models; group event recognition with a varying number of group members and with limited training data.