A Hierarchical Visual Model for Video Object Summarization

Authors:
David Liu;Gang Hua;Tsuhan Chen
Affiliations:
Siemens Corporate Research, Princeton;Nokia Research Center Hollywood, Santa Monica;Cornell University, Ithaca
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2010

Citing 0
Cited 3

Mining and cropping common objects from images

Proceedings of the international conference on Multimedia
Weakly supervised learning of object segmentations from web-scale video

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Video summarization: techniques and classification

ICCVG'12 Proceedings of the 2012 international conference on Computer Vision and Graphics

Quantified Score

Hi-index	0.14

Visualization

Abstract

We propose a novel method for removing irrelevant frames from a video given user-provided frame-level labeling for a very small number of frames. We first hypothesize a number of windows which possibly contain the object of interest, and then determine which window(s) truly contain the object of interest. Our method enjoys several favorable properties. First, compared to approaches where a single descriptor is used to describe a whole frame, each window's feature descriptor has the chance of genuinely describing the object of interest; hence it is less affected by background clutter. Second, by considering the temporal continuity of a video instead of treating frames as independent, we can hypothesize the location of the windows more accurately. Third, by infusing prior knowledge into the patch-level model, we can precisely follow the trajectory of the object of interest. This allows us to largely reduce the number of windows and hence reduce the chance of overfitting the data during learning. We demonstrate the effectiveness of the method by comparing it to several other semi-supervised learning approaches on challenging video clips.