On efficient use of multi-view data for activity recognition

Authors:
Tommi Määttä;Aki Härmä;Hamid Aghajan
Affiliations:
University of Technology, Eindhoven, The Netherlands;Philips Research, Eindhoven, The Netherlands;Stanford University
Venue:
Proceedings of the Fourth ACM/IEEE International Conference on Distributed Smart Cameras
Year:
2010

Citing 3
Cited 3

Silhouette-Based Human Identification from Body Shape and Gait

FGR '02 Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition
Feature selection using principal feature analysis

Proceedings of the 15th international conference on Multimedia
Learning to Recognize Activities from the Wrong View Point

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I

Multicamera action recognition with canonical correlation analysis and discriminative sequence classification

IWINAC'11 Proceedings of the 4th international conference on Interplay between natural and artificial computation - Volume Part I
A probabilistic, discriminative and distributed system for the recognition of human actions from multiple views

Neurocomputing
An efficient approach for multi-view human action recognition based on bag-of-key-poses

HBU'12 Proceedings of the Third international conference on Human Behavior Understanding

Quantified Score

Hi-index	0.00

Visualization

Abstract

The focus of the paper is on studying five different methods to combine multi-view data from an uncalibrated smart camera network for human activity recognition. The multi-view classification scenarios studied can be divided to two categories: view selection and view fusion methods. Selection uses a single view to classify, whereas fusion merges multi-view data either on the feature- or label-level. The five methods are compared in the task of classifying human activities in three fully annotated datasets: MAS, VIHASI and HOMELAB, and a combination dataset MAS+VIHASI. Classification is performed based on image features computed from silhouette images with a binary tree structured classifier using 1D CRF for temporal modeling. The results presented in the paper show that fusion methods outperform practical selection methods. Selection methods have their advantages, but they strongly depend on how good of a selection criteria is used, and how well this criteria adapts to different environments. Furthermore, fusion of features outperforms other scenarios within more controlled settings. But the more variability exists in camera placement and characteristics of persons, the more likely improved accuracy in multi-view activity recognition can be achieved by combining candidate labels.