Extracting key frames from consumer videos using bi-layer group sparsity

Authors:
Zheshen Wang;Mrityunjay Kumar;Jiebo Luo;Baoxin Li
Affiliations:
Arizona State University, Tempe, AZ, USA;Eastman Kodak Company, Rochester, NY, USA;Eastman Kodak Company, Rochester, NY, USA;Arizona State University, Tempe, AZ, USA
Venue:
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Year:
2011

Citing 7
Cited 0

Atomic Decomposition by Basis Pursuit

SIAM Review
A utility framework for the automatic generation of audio-visual skims

Proceedings of the tenth ACM international conference on Multimedia
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Video abstraction: A systematic review and classification

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Kodak's consumer video benchmark data set: concept definition and annotation

Proceedings of the international workshop on Workshop on multimedia information retrieval
Towards extracting semantically meaningful key frames from personal video clips: from humans to computers

IEEE Transactions on Circuits and Systems for Video Technology
Detection and representation of scenes in videos

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Compared to well-edited videos with predefined structures (e.g., news or sports videos), extracting key frames from unconstrained consumer videos remains a much more challenging problem due to their extremely diverse contents (no pre-imposed structure) and uncontrolled video quality (e.g., due to poor lighting or camera shake). In order to exploit spatio-temporal correlation present in the video for key frame extraction, we propose a bi-layer group sparse representation in which the input video frames are first segmented into homogeneous patches and group sparsity is imposed at two levels simultaneously: (i) patch-to-frame, and (ii) frame-to-sequence. The grouped sparse coefficients are further combined with frame quality scores to generate key frames. Extensive experiments are performed on videos from actual end users. Results obtained by the proposed approach compare favorably with existing methods to confirm its effectiveness.