Video keyframe extraction and filtering: a keyframe is not a keyframe to everyone
CIKM '97 Proceedings of the sixth international conference on Information and knowledge management
A stochastic framework for optimal key frame extraction from MPEG video databases
Computer Vision and Image Understanding - Special issue on content-based access for image and video libraries
Automatically extracting highlights for TV Baseball programs
MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
Optimization Algorithms for the Selection of Key Frame Sequences of Variable Length
ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Event Detection and Summarization in Sports Video
CBAIVL '01 Proceedings of the IEEE Workshop on Content-based Access of Image and Video Libraries (CBAIVL'01)
Evaluation of key frame-based retrieval techniques for video
Computer Vision and Image Understanding - Special isssue on video retrieval and summarization
Video summaries and cross-referencing through mosaic-based representation
Computer Vision and Image Understanding
Kodak's consumer video benchmark data set: concept definition and annotation
Proceedings of the international workshop on Workshop on multimedia information retrieval
Detection and representation of scenes in videos
IEEE Transactions on Multimedia
A novel video key-frame-extraction algorithm based on perceived motion energy model
IEEE Transactions on Circuits and Systems for Video Technology
Combined key-frame extraction and object-based video segmentation
IEEE Transactions on Circuits and Systems for Video Technology
Journal of Visual Communication and Image Representation
Adaptive clustering and interactive visualizations to support the selection of video clips
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Instant video summarization during shooting with mobile phone
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Extracting key frames from consumer videos using bi-layer group sparsity
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Sequence-kernel based sparse representation for amateur video summarization
J-MRE '11 Proceedings of the 2011 joint ACM workshop on Modeling and representing events
Interactive and real-time generation of home video summaries on mobile devices
IMMPD '11 Proceedings of the 2011 international ACM workshop on Interactive multimedia on mobile and portable devices
MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling
Live sharing with multimodal modes in mobile network
Journal of Mobile Multimedia
Adaptive key frame extraction for video summarization using an aggregation mechanism
Journal of Visual Communication and Image Representation
Hi-index | 0.00 |
Extracting key frames from video is of great interest in many applications, such as video summary, video organization, video compression, and prints from video. Key frame extraction is not a new problem but existing literature has focused primarily on sports or news video. In the personal or consumer video space, the biggest challenges for key frame selection are the unconstrained content and lack of any pre-imposed structures. First, in a psychovisual study, we conduct ground truth collection of key frames from video clips taken by digital cameras (as opposed to camcorders) using both first- and third-party judges. The goals of this study are to: 1) create a reference database of video clips reasonably representative of the consumer video space; 2) identify consensus key frames by which automated algorithms can be compared and judged for effectiveness, i.e., ground truth; and 3) uncover the criteria used by both first- and third-party human judges so these criteria can influence algorithm design. Next, we develop an automatic key frame extraction method dedicated to summarizing consumer video clips acquired from digital cameras. Analysis of spatio-temporal changes over time provides semantically meaningful information about the scene and the camera operator's general intents. In particular, camera and object motion are estimated and used to derive motion descriptors. A video clip is segmented into homogeneous parts based on major types of camera motion (e.g., pan, zoom, pause, steady). Dedicated rules are used to extract candidate key frames from each segment. In addition, confidence measures are computed for the candidates to enable ranking in semantic relevance. This method is scalable so that one can produce any desired number of key frames from the candidates. Finally, we demonstrate the effectiveness of our method by comparing the results with two alternative methods against the ground truth agreed by multiple judges.