Summarizing audiovisual contents of a video program

Authors:
Yihong Gong
Affiliations:
NEC Laboratories America, Inc., Cupertino, CA
Venue:
EURASIP Journal on Applied Signal Processing
Year:
2003

Citing 9
Cited 7

Introduction to algorithms

Introduction to algorithms
IMPACT: an interactive natural-motion-picture dedicated multimedia authoring system

CHI '91 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
VideoMAP and VideoSpaceIcon: tools for anatomizing video content

INTERCHI '93 Proceedings of the INTERCHI '93 conference on Human factors in computing systems
Summarizing text documents: sentence selection and evaluation metrics

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
CueVideo: automated multimedia indexing and retrieval

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 2)
Generic text summarization using relevance measure and latent semantic analysis

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Video Skimming and Characterization through the Combination of Image and Language Understanding Techniques

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Time-Constrained Keyframe Selection Technique

ICMCS '99 Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2

Video abstraction: A systematic review and classification

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Split-screen dynamically accelerated video summaries

Proceedings of the international workshop on TRECVID video summarization
Summarization scheme based on near-duplicate analysis

TVS '08 Proceedings of the 2nd ACM TRECVid Video Summarization Workshop
Graph-based multilevel temporal segmentation of scripted content videos

GbRPR'07 Proceedings of the 6th IAPR-TC-15 international conference on Graph-based representations in pattern recognition
What are the most eye-catching and ear-catching features in the video?: implications for video summarization

Proceedings of the 19th international conference on World wide web
Video scene detection using graph-based representations

Image Communication
Dominant sets based movie scene detection

Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we focus on video programs that are intended to disseminate information and knowledge such as news, documentaries, seminars, etc, and present an audiovisual summarization system that summarizes the audio and visual contents of the given video separately, and then integrating the two summaries with a partial alignment. The audio summary is created by selecting spoken sentences that best present the main content of the audio speech while the visual summary is created by eliminating duplicates/redundancies and preserving visually rich contents in the image stream. The alignment operation aims to synchronize each spoken sentence in the audio summary with its corresponding speaker's face and to preserve the rich content in the visual summary. A Bipartite Graph-based audiovisual alignment algorithm is developed to efficiently find the best alignment solution that satisfies these alignment requirements. With the proposed system, we strive to produce a video summary that: (1) provides a natural visual and audio content overview, and (2) maximizes the coverage for both audio and visual contents of the original video without having to sacrifice either of them.