Modeling human activities as speech

Authors:
Chia-Chih Chen;J. K. Aggarwal
Affiliations:
Dept. of ECE, Univ. of Texas at Austin, Austin, TX, USA;Dept. of ECE, Univ. of Texas at Austin, Austin, TX, USA
Venue:
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Year:
2011

Citing 0
Cited 1

Learning common behaviors from large sets of unlabeled temporal series

Image and Vision Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Human activity recognition and speech recognition appear to be two loosely related research areas. However, on a careful thought, there are several analogies between activity and speech signals with regard to the way they are generated, propagated, and perceived. In this paper, we propose a novel action representation, the action spectrogram, which is inspired by a common spectrographic representation of speech. Different from sound spectrogram, an action spectrogram is a space-time-frequency representation which characterizes the short-time spectral properties of body parts' movements. While the essence of the speech signal is the variation of air pressure in time, our method models activities as the likelihood time series of action associated local interest patterns. This low-level process is realized by learning boosted window classifiers from spatially quantized spatio-temporal interest features. We have tested our algorithm on a variety of human activity datasets and achieved superior results.