Weakly supervised learning of object segmentations from web-scale video

Authors:
Glenn Hartmann;Matthias Grundmann;Judy Hoffman;David Tsai;Vivek Kwatra;Omid Madani;Sudheendra Vijayanarasimhan;Irfan Essa;James Rehg;Rahul Sukthankar
Affiliations:
Google Research;Georgia Institute of Technology;University of California, Berkeley;Georgia Institute of Technology;Google Research;Google Research;Google Research;Georgia Institute of Technology;Georgia Institute of Technology;Google Research
Venue:
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Year:
2012

Citing 14
Cited 0

Fast Approximate Energy Minimization via Graph Cuts

IEEE Transactions on Pattern Analysis and Machine Intelligence
Motion Layer Extraction in the Presence of Occlusion Using Graph Cuts

IEEE Transactions on Pattern Analysis and Machine Intelligence
Building Models of Animals from Video

IEEE Transactions on Pattern Analysis and Machine Intelligence
MILES: Multiple-Instance Learning via Embedded Instance Selection

IEEE Transactions on Pattern Analysis and Machine Intelligence
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
Extracting Moving People from Internet Videos

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part IV
Seeing the Objects Behind the Dots: Recognition in Videos from a Moving Camera

International Journal of Computer Vision
Boosting with structural sparsity

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Object segmentation by long term analysis of point trajectories

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
A Hierarchical Visual Model for Video Object Summarization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Auto-directed video stabilization with robust L1 optimal camera paths

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Improving classifiers with unlabeled weakly-related videos

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
FlowBoost -- Appearance learning from sparsely annotated video

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Learning spatiotemporal graphs of human activities

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose to learn pixel-level segmentations of objects from weakly labeled (tagged) internet videos. Specifically, given a large collection of raw YouTube content, along with potentially noisy tags, our goal is to automatically generate spatiotemporal masks for each object, such as "dog", without employing any pre-trained object detectors. We formulate this problem as learning weakly supervised classifiers for a set of independent spatio-temporal segments. The object seeds obtained using segment-level classifiers are further refined using graphcuts to generate high-precision object masks. Our results, obtained by training on a dataset of 20,000 YouTube videos weakly tagged into 15 classes, demonstrate automatic extraction of pixel-level object masks. Evaluated against a ground-truthed subset of 50,000 frames with pixel-level annotations, we confirm that our proposed methods can learn good object masks just by watching YouTube.