Learning to segment a video to clips based on scene and camera motion

  • Authors:
  • Adarsh Kowdle;Tsuhan Chen

  • Affiliations:
  • Cornell University, Ithaca, NY;Cornell University, Ithaca, NY

  • Venue:
  • ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a novel learning-based algorithm for temporal segmentation of a video into clips based on both camera and scene motion, in particular, based on combinations of static vs. dynamic camera and static vs. dynamic scene. Given a video, we first perform shot boundary detection to segment the video to shots. We enforce temporal continuity by constructing a Markov Random Field (MRF) over the frames of each video shot with edges between consecutive frames and cast the segmentation problem as a frame level discrete labeling problem. Using manually labeled data we learn classifiers exploiting cues from optical flow to provide evidence for the different labels, and infer the best labeling over the frames. We show the effectiveness of the approach using user videos and full-length movies. Using sixty full-length movies spanning 50 years, we show that the proposed algorithm of grouping frames purely based on motion cues can aid computational applications such as recovering depth from a video and also reveal interesting trends in movies, which finds itself interesting novel applications in video analysis (time-stamping archive movies) and film studies.