Gesture Recognition using Hidden Markov Models from Fragmented Observations

Authors:
Ruiduo Yang;Sudeep Sarkar
Affiliations:
University of South Florida;University of South Florida
Venue:
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Year:
2006

Citing 0
Cited 7

Coupled grouping and matching for sign and gesture recognition

Computer Vision and Image Understanding
A shared parameter model for gesture and sub-gesture analysis

IWCIA'11 Proceedings of the 14th international conference on Combinatorial image analysis
3D gestural interaction for stereoscopic visualization on mobile devices

CAIP'11 Proceedings of the 14th international conference on Computer analysis of images and patterns - Volume Part II
An application oriented and shape feature based multi-touch gesture description and recognition method

Multimedia Tools and Applications
Experiencing real 3D gestural interaction with mobile devices

Pattern Recognition Letters
Gesture recognition using neural networks based on HW/SW cosimulation platform

Advances in Software Engineering
Most discriminating segment - Longest common subsequence (MDSLCS) algorithm for dynamic hand gesture classification

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of computing the likelihood of a gesture from regular, unaided video sequences, without relying on perfect segmentation of the scene. Instead of requiring that low-and mid-level processes produce near-perfect segmentation of relevant body parts such as hands, we take into account that such processes can only produce uncertain information. The hands can only be detected as fragmented regions along with clutter. To address this problem, we propose an extension of the HMM formalism, which we call the frag-HMM, to allow for reasoning based on fragmented observations, via the use of an intermediate grouping process. In this formulation, we do not match the frag- HMMto one observation sequence, but rather to a sequence of observation sets, where each observation set is a collection of groups of fragmented observations. Based on the developed model, we show how to perform three kinds of computations. The first one is to decide on the best observation group for each frame, given a sequence of observation groups for the past frames. This allows us to incrementally compute the best segmentation of the hand for each frame, given the model. The second one involves the computation of likelihood of a sequence, averaged over all possible states sequences and possible groupings. The third is the computation of the likelihood of a sequence, maximized over all possible state sequences and group sequences. This can give us the best possible groupings for each frame, as well. We demonstrate our ideas using a publicly available hand gesture dataset that spans different subjects, is against complex background, and involves hand occlusions. The recognition performance is within 2% of that obtained with manually segmented hands and about 10% better than that obtained with segmentations that use the prior knowledge of the hand color.