Sequence-kernel based sparse representation for amateur video summarization

Authors:
Zheshen Wang;Mrityunjay Kumar;Jiebo Luo;Baoxin Li
Affiliations:
Arizona State University, Tempe, AZ, USA;Eastman Kodak Company, Rochester, NY, USA;Eastman Kodak Company, Rochester, NY, USA;Arizona State University, Tempe, AZ, USA
Venue:
J-MRE '11 Proceedings of the 2011 joint ACM workshop on Modeling and representing events
Year:
2011

Citing 18
Cited 1

Atomic Decomposition by Basis Pursuit

SIAM Review
A utility framework for the automatic generation of audio-visual skims

Proceedings of the tenth ACM international conference on Multimedia
Automatic Video Summarization by Graph Modeling

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Optimal Cluster Preserving Embedding of Nonmetric Proximity Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
Robust Real-Time Face Detection

International Journal of Computer Vision
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Space-Time Video Montage

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Evaluation campaigns and TRECVid

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Video abstraction: A systematic review and classification

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Video summarization preserving dynamic content

Proceedings of the international workshop on TRECVID video summarization
Kodak's consumer video benchmark data set: concept definition and annotation

Proceedings of the international workshop on Workshop on multimedia information retrieval
Video summarization by redundancy removing and content ranking

Proceedings of the 15th international conference on Multimedia
Video summarisation: A conceptual framework and survey of the state of the art

Journal of Visual Communication and Image Representation
Towards extracting semantically meaningful key frames from personal video clips: from humans to computers

IEEE Transactions on Circuits and Systems for Video Technology
Human activity encoding and recognition using low-level visual features

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Bridging the semantic gap in sports video retrieval and summarization

Journal of Visual Communication and Image Representation
Kernel sparse representation for image classification and face recognition

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Detection and representation of scenes in videos

IEEE Transactions on Multimedia

Modeling and representing events in multimedia

MM '11 Proceedings of the 19th ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic video summarization is critical for facilitating fast browsing and efficient management of multimedia data. Compared to well-edited videos with predefined structures (e.g., movies) or constrained contents (e.g., news or sports videos), upon which existing methods focus, the main challenges of summarizing unconstrained amateur or consumer videos include dealing with extremely diverse contents without any pre-imposed structure and typically mediocre video quality. To address these challenges, we explore a signal-reconstruction-based approach relying only on visual content. In particular, we propose a sequence-kernel-based sparse representation approach for directly summarizing consumer videos. A dictionary of subsequences is first constructed from clustered frames with importance ranking scores of extracted high-level semantics. Video summarization is formulated to seek an optimal combination of the dictionary elements that robustly represents the original video. Weighted-sequence distance is exploited to compute the approximation error, and the kernel-based feature-sign algorithm is used to estimate the sparse coefficients. A linear combination over the dictionary with the obtained optimal sparse coefficients is output as the final summary video. Extensive experiments are performed on 18 videos with subjective ratings from 7 evaluators. Results obtained by the proposed approach compare favorably with two existing methods both visually and quantitatively, validating its effectiveness.