Speeding up spatio-temporal sliding-window search for efficient event detection in crowded videos
EiMM '09 Proceedings of the 1st ACM international workshop on Events in multimedia
Boosting-Based Multimodal Speaker Detection for Distributed Meeting Videos
IEEE Transactions on Multimedia
IEEE Transactions on Circuits and Systems for Video Technology
Expandable Data-Driven Graphical Modeling of Human Actions Based on Salient Postures
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.01 |
Endowing machine with human-like seeing and hearing capabilities has long been an endeavor for generations of scientists. Although seeing and hearing are granted to a human, doing so by a computer is a formidable challenge. In this talk, I will focus on event understanding using audio and visual information either separately or jointly. In particular, I will describe the following pieces of work conducted jointly with my collaborators: presence detection ("where", "what", "when", and "who") in an office environment using multi-sensory inputs from streams of video, audio and computer interactions (mouse and keyboard information); active speaker detection in a meeting room using microphone arrays and camera arrays; human action recognition with expandable graphical models; group event recognition with a varying number of group members and with limited training data.