Automated extraction of object- and event-metadata from gesture video using a Bayesian network

Authors:
Dimitrios I. Kosmopoulos
Affiliations:
National Centre for Scientific Research "Demokritos", Institute of Informatics & Telecommunications, Aghia Paraskevi, Greece
Venue:
ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
Year:
2005

Citing 9
Cited 0

Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
CONDENSATION—Conditional Density Propagation forVisual Tracking

International Journal of Computer Vision
Parametric Hidden Markov Models for Gesture Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
An HMM-Based Threshold Model Approach for Gesture Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
A framework for recognizing the simultaneous aspects of American sign language

Computer Vision and Image Understanding - Modeling people toward vision-based underatanding of a person's shape, appearance, and movement
Bayesian Networks and Decision Graphs

Bayesian Networks and Decision Graphs
View-Invariant Representation and Recognition of Actions

International Journal of Computer Vision
Vision-Based Gesture Recognition: A Review

GW '99 Proceedings of the International Gesture Workshop on Gesture-Based Communication in Human-Computer Interaction
DigitEyes: Vision-Based Human Hand Tracking

DigitEyes: Vision-Based Human Hand Tracking

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work a method for metadata extraction from sign language videos is proposed, by employing high level domain knowledge. The metadata concern the depicted objects of the head and the right/left hand and the occlusion events, which are essential for interpretation and therefore for subsequent higher level semantic indexing. The occlusions between hands, head and hands and body and hands, can easily confuse metadata extraction and can consequently lead to wrong gesture interpretation. Therefore, a Bayesian network is employed to bridge the gap between the high level knowledge about the valid spatiotemporal configurations of the human body and the metadata extractor. The approach is applied here in sign-language videos, but it can be generalized to video indexing based on gestures.