Coupled grouping and matching for sign and gesture recognition
Computer Vision and Image Understanding
A shared parameter model for gesture and sub-gesture analysis
IWCIA'11 Proceedings of the 14th international conference on Combinatorial image analysis
3D gestural interaction for stereoscopic visualization on mobile devices
CAIP'11 Proceedings of the 14th international conference on Computer analysis of images and patterns - Volume Part II
Multimedia Tools and Applications
Experiencing real 3D gestural interaction with mobile devices
Pattern Recognition Letters
Gesture recognition using neural networks based on HW/SW cosimulation platform
Advances in Software Engineering
Hi-index | 0.00 |
We consider the problem of computing the likelihood of a gesture from regular, unaided video sequences, without relying on perfect segmentation of the scene. Instead of requiring that low-and mid-level processes produce near-perfect segmentation of relevant body parts such as hands, we take into account that such processes can only produce uncertain information. The hands can only be detected as fragmented regions along with clutter. To address this problem, we propose an extension of the HMM formalism, which we call the frag-HMM, to allow for reasoning based on fragmented observations, via the use of an intermediate grouping process. In this formulation, we do not match the frag- HMMto one observation sequence, but rather to a sequence of observation sets, where each observation set is a collection of groups of fragmented observations. Based on the developed model, we show how to perform three kinds of computations. The first one is to decide on the best observation group for each frame, given a sequence of observation groups for the past frames. This allows us to incrementally compute the best segmentation of the hand for each frame, given the model. The second one involves the computation of likelihood of a sequence, averaged over all possible states sequences and possible groupings. The third is the computation of the likelihood of a sequence, maximized over all possible state sequences and group sequences. This can give us the best possible groupings for each frame, as well. We demonstrate our ideas using a publicly available hand gesture dataset that spans different subjects, is against complex background, and involves hand occlusions. The recognition performance is within 2% of that obtained with manually segmented hands and about 10% better than that obtained with segmentations that use the prior knowledge of the hand color.