Automated sip detection in naturally-evoked video

Authors:
Rana el Kaliouby;Mina Mikhail
Affiliations:
Massachusetts Institute of Technology, Cambridge, MA, USA;American University in Cairo, Cairo, Egypt
Venue:
ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Year:
2008

Citing 9
Cited 0

A tutorial on hidden Markov models and selected applications in speech recognition

Readings in speech recognition
Automatic replay generation for soccer video broadcasting

Proceedings of the 12th annual ACM international conference on Multimedia
Semantic Event Detection using Conditional Random Fields

CVPRW '06 Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop
Object tracking: A survey

ACM Computing Surveys (CSUR)
A Semantic Event Detection Approach for Soccer Video based on Perception Concepts and Finiste State Machines

WIAMIS '07 Proceedings of the Eight International Workshop on Image Analysis for Multimedia Interactive Services
Unsupervised content-based indexing of sports video

Proceedings of the international workshop on Workshop on multimedia information retrieval
Rule-based Event Detection of Broadcast Baseball Videos Using Mid-level Cues

ICICIC '07 Proceedings of the Second International Conference on Innovative Computing, Informatio and Control
Unsupervised Event Detection in Videos

ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Multiple agent event detection and representation in videos

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Quantifying consumer experiences is an emerging application area for event detection in video. This paper presents a hierarchical model for robust sip detection that combines bottom-up processing of face videos, namely real-time head action unit analysis and and head gesture recognition, with top-down knowledge about sip events and task semantics. Our algorithm achieves an average accuracy of 82% in videos that feature single sips, and an average accuracy of 78% and false positive rate of 0.3%, in more challenging videos that feature multiple sips and chewing actions. We discuss the generality of our methodology to detecting other events in similar contexts.