Extracting Moving People from Internet Videos

Authors:
Juan Carlos Niebles;Bohyung Han;Andras Ferencz;Li Fei-Fei
Affiliations:
Princeton University, Princeton, USA and Universidad del Norte, Colombia;Mobileye Vision Technologies, Princeton, USA;Mobileye Vision Technologies, Princeton, USA;Princeton University, Princeton, USA
Venue:
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part IV
Year:
2008

Citing 0
Cited 9

Short-term audio-visual atoms for generic video concept classification

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Learning generic human body models

AMDO'10 Proceedings of the 6th international conference on Articulated motion and deformable objects
Object, scene and actions: combining multiple features for human action recognition

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Activities as time series of human postures

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
An attempt to segment foreground in dynamic scenes

ISVC'11 Proceedings of the 7th international conference on Advances in visual computing - Volume Part I
Improving video classification via youtube video co-watch data

SBNMA '11 Proceedings of the 2011 ACM workshop on Social and behavioural networked media access
Weakly supervised learning of object segmentations from web-scale video

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Online web-data-driven segmentation of selected moving objects in videos

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II
Automatic extraction of relevant video shots of specific actions exploiting Web data

Computer Vision and Image Understanding

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a fully automatic framework to detect and extract arbitrary human motion volumes from real-world videos collected from YouTube. Our system is composed of two stages. A person detector is first applied to provide crude information about the possible locations of humans. Then a constrained clustering algorithm groups the detections and rejects false positives based on the appearance similarity and spatio-temporal coherence. In the second stage, we apply a top-down pictorial structure model to complete the extraction of the humans in arbitrary motion. During this procedure, a density propagation technique based on a mixture of Gaussians is employed to propagate temporal information in a principled way. This method reduces greatly the search space for the measurement in the inference stage. We demonstrate the initial success of this framework both quantitatively and qualitatively by using a number of YouTube videos.