Video event detection and summarization using audio, visual and text saliency

  • Authors:
  • G. Evangelopoulos;A. Zlatintsi;G. Skoumas;K. Rapantzikos;A. Potamianos;P. Maragos;Y. Avrithis

  • Affiliations:
  • School of ECE, National Technical University of Athens, 15773, Greece;School of ECE, National Technical University of Athens, 15773, Greece;Dept. of ECE, Technical University of Crete, 73100 Chania, Greece;School of ECE, National Technical University of Athens, 15773, Greece;Dept. of ECE, Technical University of Crete, 73100 Chania, Greece;School of ECE, National Technical University of Athens, 15773, Greece;School of ECE, National Technical University of Athens, 15773, Greece

  • Venue:
  • ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Detection of perceptually important video events is formulated here on the basis of saliency models for the audio, visual and textual information conveyed in a video stream. Audio saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking. Visual saliency is measured through a spatiotemporal attention model driven by intensity, color and motion. Text saliency is extracted from part-of-speech tagging on the subtitles information available with most movie distributions. The various modality curves are integrated in a single attention curve, where the presence of an event may be signified in one or multiple domains. This multimodal saliency curve is the basis of a bottom-up video summarization algorithm, that refines results from unimodal or audiovisual-based skimming. The algorithm performs favorably for video summarization in terms of informativeness and enjoyability.