Multi-view action recognition using local similarity random forests and sensor fusion

Authors:
Fan Zhu;Ling Shao;Mingxiu Lin
Affiliations:
Department of Electronic and Electrical Engineering, The University of Sheffield, UK;Department of Electronic and Electrical Engineering, The University of Sheffield, UK;College of Information Science and Engineering, Northeastern University, China
Venue:
Pattern Recognition Letters
Year:
2013

Citing 8
Cited 1

Random Forests

Machine Learning
Free viewpoint action recognition using motion history volumes

Computer Vision and Image Understanding - Special issue on modeling people: Vision-based understanding of a person's shape, appearance, movement, and behaviour
Cross-View Action Recognition from Temporal Self-similarities

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
More generality in efficient multiple kernel learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Making action recognition robust to occlusions and viewpoint changes

ECCV'10 Proceedings of the 11th European conference on computer vision conference on Computer vision: Part III
Are current monocular computer vision systems for human action recognition suitable for visual surveillance applications?

ISVC'11 Proceedings of the 7th international conference on Advances in visual computing - Volume Part II
Sampling strategies for bag-of-features image classification

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV
Fusion of single view soft k-NN classifiers for multicamera human action recognition

HAIS'10 Proceedings of the 5th international conference on Hybrid Artificial Intelligence Systems - Volume Part II

Spatiotemporal bag-of-features for early wildfire smoke detection

Image and Vision Computing

Quantified Score

Hi-index	0.10

Visualization

Abstract

This paper addresses the multi-view action recognition problem with a local segment similarity voting scheme, upon which we build a novel multi-sensor fusion method. The recently proposed random forests classifier is used to map the local segment features to their corresponding prediction histograms. We compare the results of our approach with those of the baseline Bag-of-Words (BoW) and the Naive-Bayes Nearest Neighbor (NBNN) methods on the multi-view IXMAS dataset. Additionally, comparisons between our multi-camera fusion strategy and the normally used early feature concatenating strategy are also carried out using different camera views and different segment scales. It is proven that the proposed sensor fusion technique, coupled with the random forests classifier, is effective for multiple view human action recognition.