Distribution-Based Dimensionality Reduction Applied to Articulated Motion Recognition

Authors:
Sunita Nayak;Sudeep Sarkar;Barbara Loeding
Affiliations:
Photometria Inc., San Diego;University of South Florida, Tampa;University of South Florida, Lakeland
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2009

Citing 0
Cited 7

Modelling and recognition of the linguistic components in American Sign Language

Image and Vision Computing
Silhouette representation and matching for 3D pose discrimination - A comparative study

Image and Vision Computing
3D human pose recovery from image by efficient visual feature selection

Computer Vision and Image Understanding
Online motion recognition using an accelerometer in a mobile device

Expert Systems with Applications: An International Journal
Human action recognition using Pose-based discriminant embedding

Image Communication
Retrieval-based cartoon gesture recognition and applications via semi-supervised heterogeneous classifiers learning

Pattern Recognition
Finding recurrent patterns from continuous sign language sentences for automated extraction of signs

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.15

Visualization

Abstract

Some articulated motion representations rely on frame-wise abstractions of the statistical distribution of low-level features such as orientation, color, or relational distributions. As configuration among parts changes with articulated motion, the distribution changes, tracing a trajectory in the latent space of distributions, which we call the configuration space. These trajectories can then be used for recognition using standard techniques such as dynamic time warping. The core theory in this paper concerns embedding the frame-wise distributions, which can be looked upon as probability functions, into a low-dimensional space so that we can estimate various meaningful probabilistic distances such as the Chernoff, Bhattacharya, Matusita, Kullback-Leibler (KL) or symmetric-KL distances based on dot products between points in this space. Apart from computational advantages, this representation also affords speed-normalized matching of motion signatures. Speed normalized representations can be formed by interpolating the configuration trajectories along their arc lengths, without using any knowledge of the temporal scale variations between the sequences. We experiment with five different probabilistic distance measures and show the usefulness of the representation in three different contexts—sign recognition (with large number of possible classes), gesture recognition (with person variations), and classification of human-human interaction sequences (with segmentation problems). We find the importance of using the right distance measure for each situation. The low-dimensional embedding makes matching two to three times faster, while achieving recognition accuracies that are close to those obtained without using a low-dimensional embedding. We also empirically establish the robustness of the representation with respect to low-level parameters, embedding parameters, and temporal-scale parameters.