Fast unsupervised ego-action learning for first-person sports videos

Authors:
K. M. Kitani;T. Okabe;Y. Sato;A. Sugimoto
Affiliations:
UEC Tokyo, Tokyo, Japan;Univ. of Tokyo, Tokyo, Japan;Univ. of Tokyo, Tokyo, Japan;Nat. Inst. of Inf., Tokyo, Japan
Venue:
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Year:
2011

Citing 0
Cited 6

Detecting eye contact using wearable eye-tracking glasses

Proceedings of the 2012 ACM Conference on Ubiquitous Computing
Learning to recognize daily actions using gaze

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Egocentric activity monitoring and recovery

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part III
Modeling instrumental activities of daily living in egocentric vision as sequences of active objects and context for alzheimer disease research

Proceedings of the 1st ACM international workshop on Multimedia indexing and information retrieval for healthcare
Personal driving diary: Automated recognition of driving events from first-person videos

Computer Vision and Image Understanding
Teleport: space navigation by detecting the self-motion of a mobile device

SIGGRAPH Asia 2013 Posters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Portable high-quality sports cameras (e.g. head or helmet mounted) built for recording dynamic first-person video footage are becoming a common item among many sports enthusiasts. We address the novel task of discovering first-person action categories (which we call ego-actions) which can be useful for such tasks as video indexing and retrieval. In order to learn ego-action categories, we investigate the use of motion-based histograms and unsupervised learning algorithms to quickly cluster video content. Our approach assumes a completely unsupervised scenario, where labeled training videos are not available, videos are not pre-segmented and the number of ego-action categories are unknown. In our proposed framework we show that a stacked Dirichlet process mixture model can be used to automatically learn a motion histogram codebook and the set of ego-action categories. We quantitatively evaluate our approach on both in-house and public YouTube videos and demonstrate robust ego-action categorization across several sports genres. Comparative analysis shows that our approach outperforms other state-of-the-art topic models with respect to both classification accuracy and computational speed. Preliminary results indicate that on average, the categorical content of a 10 minute video sequence can be indexed in under 5 seconds.