Video event detection and summarization using audio, visual and text saliency

Authors:
G. Evangelopoulos;A. Zlatintsi;G. Skoumas;K. Rapantzikos;A. Potamianos;P. Maragos;Y. Avrithis
Affiliations:
School of ECE, National Technical University of Athens, 15773, Greece;School of ECE, National Technical University of Athens, 15773, Greece;Dept. of ECE, Technical University of Crete, 73100 Chania, Greece;School of ECE, National Technical University of Athens, 15773, Greece;Dept. of ECE, Technical University of Crete, 73100 Chania, Greece;School of ECE, National Technical University of Athens, 15773, Greece;School of ECE, National Technical University of Athens, 15773, Greece
Venue:
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Year:
2009

Citing 0
Cited 3

Video summarization with visual and semantic features

PCM'10 Proceedings of the 11th Pacific Rim conference on Advances in multimedia information processing: Part I
Network-aware summarisation for resource discovery in P2P-content networks

Future Generation Computer Systems
P2P-based resource discovery in dynamic grids allowing multi-attribute and range queries

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Detection of perceptually important video events is formulated here on the basis of saliency models for the audio, visual and textual information conveyed in a video stream. Audio saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking. Visual saliency is measured through a spatiotemporal attention model driven by intensity, color and motion. Text saliency is extracted from part-of-speech tagging on the subtitles information available with most movie distributions. The various modality curves are integrated in a single attention curve, where the presence of an event may be signified in one or multiple domains. This multimodal saliency curve is the basis of a bottom-up video summarization algorithm, that refines results from unimodal or audiovisual-based skimming. The algorithm performs favorably for video summarization in terms of informativeness and enjoyability.