Towards extracting semantically meaningful key frames from personal video clips: from humans to computers

Authors:
Jiebo Luo;Christophe Papin;Kathleen Costello
Affiliations:
Kodak Research Laboratories, Eastman Kodak Company, Rochester, NY;Multimedia Processing Group of Thales Communications France, Colombes Cedex, France and Kodak Research Laboratories, Eastman Kodak Company, Rochester, NY;Kodak Research Laboratories, Eastman Kodak Company, Rochester, NY
Venue:
IEEE Transactions on Circuits and Systems for Video Technology
Year:
2009

Citing 11
Cited 9

Video keyframe extraction and filtering: a keyframe is not a keyframe to everyone

CIKM '97 Proceedings of the sixth international conference on Information and knowledge management
A stochastic framework for optimal key frame extraction from MPEG video databases

Computer Vision and Image Understanding - Special issue on content-based access for image and video libraries
Automatically extracting highlights for TV Baseball programs

MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
Optimization Algorithms for the Selection of Key Frame Sequences of Variable Length

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Event Detection and Summarization in Sports Video

CBAIVL '01 Proceedings of the IEEE Workshop on Content-based Access of Image and Video Libraries (CBAIVL'01)
Evaluation of key frame-based retrieval techniques for video

Computer Vision and Image Understanding - Special isssue on video retrieval and summarization
Video summaries and cross-referencing through mosaic-based representation

Computer Vision and Image Understanding
Kodak's consumer video benchmark data set: concept definition and annotation

Proceedings of the international workshop on Workshop on multimedia information retrieval
Detection and representation of scenes in videos

IEEE Transactions on Multimedia
A novel video key-frame-extraction algorithm based on perceived motion energy model

IEEE Transactions on Circuits and Systems for Video Technology
Combined key-frame extraction and object-based video segmentation

IEEE Transactions on Circuits and Systems for Video Technology

Augmented keyframe

Journal of Visual Communication and Image Representation
Adaptive clustering and interactive visualizations to support the selection of video clips

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Instant video summarization during shooting with mobile phone

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Extracting key frames from consumer videos using bi-layer group sparsity

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Sequence-kernel based sparse representation for amateur video summarization

J-MRE '11 Proceedings of the 2011 joint ACM workshop on Modeling and representing events
Interactive and real-time generation of home video summaries on mobile devices

IMMPD '11 Proceedings of the 2011 international ACM workshop on Interactive multimedia on mobile and portable devices
Facial parameters and their influence on subjective impression in the context of keyframe extraction from home video contents

MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling
Live sharing with multimodal modes in mobile network

Journal of Mobile Multimedia
Adaptive key frame extraction for video summarization using an aggregation mechanism

Journal of Visual Communication and Image Representation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Extracting key frames from video is of great interest in many applications, such as video summary, video organization, video compression, and prints from video. Key frame extraction is not a new problem but existing literature has focused primarily on sports or news video. In the personal or consumer video space, the biggest challenges for key frame selection are the unconstrained content and lack of any pre-imposed structures. First, in a psychovisual study, we conduct ground truth collection of key frames from video clips taken by digital cameras (as opposed to camcorders) using both first- and third-party judges. The goals of this study are to: 1) create a reference database of video clips reasonably representative of the consumer video space; 2) identify consensus key frames by which automated algorithms can be compared and judged for effectiveness, i.e., ground truth; and 3) uncover the criteria used by both first- and third-party human judges so these criteria can influence algorithm design. Next, we develop an automatic key frame extraction method dedicated to summarizing consumer video clips acquired from digital cameras. Analysis of spatio-temporal changes over time provides semantically meaningful information about the scene and the camera operator's general intents. In particular, camera and object motion are estimated and used to derive motion descriptors. A video clip is segmented into homogeneous parts based on major types of camera motion (e.g., pan, zoom, pause, steady). Dedicated rules are used to extract candidate key frames from each segment. In addition, confidence measures are computed for the candidates to enable ranking in semantic relevance. This method is scalable so that one can produce any desired number of key frames from the candidates. Finally, we demonstrate the effectiveness of our method by comparing the results with two alternative methods against the ground truth agreed by multiple judges.