Summarizing Rushes Videos by Motion, Object, and Event Understanding

Authors:
Feng Wang;Chong-Wah Ngo
Affiliations:
Department of Computer Science and Technology, East China Normal University, Shanghai, China;Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
Venue:
IEEE Transactions on Multimedia
Year:
2012

Citing 0
Cited 1

Video summarization: techniques and classification

ICCVG'12 Proceedings of the 2012 international conference on Computer Vision and Graphics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Rushes footages are considered as cheap gold mine with the potential for reuse in broadcasting and filmmaking industries. However, mining “gold” from unedited videos such as rushes is challenging as the reusable segments are buried in a large set of redundant information. In this paper, we propose a unified framework for stock footage classification and summarization to support video editors in navigating and organizing rushes videos. Our approach is composed of two steps. First, we employ motion features to filter the undesired camera motion and locate the stock footage. A hierarchical hidden Markov model (HHMM) is proposed to model the motion feature distribution and classify video segments into different categories to decide their potential for reuse. Second, we generate a short video summary to facilitate quick browsing of the stock footages by including the objects and events that are important for storytelling. For objects, we detect the presence of persons and moving objects. For events, we extract a set of features to detect and describe visual (motion activities and scene changes) and audio events (speech clips). A representability measure is then proposed to select the most representative video clips for video summarization. Our experiments show that the proposed HHMM significantly outperforms other methods based on SVM, FSM, and HMM. The automatically generated rushes summaries are also demonstrated to be easy-to-understand, containing little redundancy, and capable of including ground-truth objects and events with shorter durations and relatively pleasant rhythm based on the TRECVID 2007, 2008, and our subjective evaluations.