Weakly supervised learning of object segmentations from web-scale video

  • Authors:
  • Glenn Hartmann;Matthias Grundmann;Judy Hoffman;David Tsai;Vivek Kwatra;Omid Madani;Sudheendra Vijayanarasimhan;Irfan Essa;James Rehg;Rahul Sukthankar

  • Affiliations:
  • Google Research;Georgia Institute of Technology;University of California, Berkeley;Georgia Institute of Technology;Google Research;Google Research;Google Research;Georgia Institute of Technology;Georgia Institute of Technology;Google Research

  • Venue:
  • ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose to learn pixel-level segmentations of objects from weakly labeled (tagged) internet videos. Specifically, given a large collection of raw YouTube content, along with potentially noisy tags, our goal is to automatically generate spatiotemporal masks for each object, such as "dog", without employing any pre-trained object detectors. We formulate this problem as learning weakly supervised classifiers for a set of independent spatio-temporal segments. The object seeds obtained using segment-level classifiers are further refined using graphcuts to generate high-precision object masks. Our results, obtained by training on a dataset of 20,000 YouTube videos weakly tagged into 15 classes, demonstrate automatic extraction of pixel-level object masks. Evaluated against a ground-truthed subset of 50,000 frames with pixel-level annotations, we confirm that our proposed methods can learn good object masks just by watching YouTube.