Audio-visual analysis for event understanding

Authors:
Zhengyou Zhang
Affiliations:
Microsoft Research, Redmond, WA, USA
Venue:
AMC '09 Proceedings of the 2009 workshop on Ambient media computing
Year:
2009

Citing 4
Cited 0

Speeding up spatio-temporal sliding-window search for efficient event detection in crowded videos

EiMM '09 Proceedings of the 1st ACM international workshop on Events in multimedia
Boosting-Based Multimodal Speaker Detection for Distributed Meeting Videos

IEEE Transactions on Multimedia
Activity Recognition Using a Combination of Category Components and Local Models for Video Surveillance

IEEE Transactions on Circuits and Systems for Video Technology
Expandable Data-Driven Graphical Modeling of Human Actions Based on Salient Postures

IEEE Transactions on Circuits and Systems for Video Technology

Quantified Score

Hi-index	0.01

Visualization

Abstract

Endowing machine with human-like seeing and hearing capabilities has long been an endeavor for generations of scientists. Although seeing and hearing are granted to a human, doing so by a computer is a formidable challenge. In this talk, I will focus on event understanding using audio and visual information either separately or jointly. In particular, I will describe the following pieces of work conducted jointly with my collaborators: presence detection ("where", "what", "when", and "who") in an office environment using multi-sensory inputs from streams of video, audio and computer interactions (mouse and keyboard information); active speaker detection in a meeting room using microphone arrays and camera arrays; human action recognition with expandable graphical models; group event recognition with a varying number of group members and with limited training data.