Efficient video coding based on audio-visual focus of attention

Authors:
Jong-Seok Lee;Francesca De Simone;Touradj Ebrahimi
Affiliations:
Multimedia Signal Processing Group (MMSPG), Institute of Electrical Engineering (IEL), Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland;Multimedia Signal Processing Group (MMSPG), Institute of Electrical Engineering (IEL), Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland;Multimedia Signal Processing Group (MMSPG), Institute of Electrical Engineering (IEL), Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
Venue:
Journal of Visual Communication and Image Representation
Year:
2011

Citing 11
Cited 3

Canonical Correlation Analysis: An Overview with Application to Learning Methods

Neural Computation
Neural Substrates of Perceptual Enhancement by Cross-Modal Spatial Attention

Journal of Cognitive Neuroscience
Video coding based on audio-visual attention

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Cross-Modal Localization via Sparsity

IEEE Transactions on Signal Processing
Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings

IEEE Transactions on Audio, Speech, and Language Processing
Spatiotemporal Visual Considerations for Video Coding

IEEE Transactions on Multimedia
Extraction of Audio Features Specific to Speech Production for Multimodal Speaker Detection

IEEE Transactions on Multimedia
Foveation scalable video coding with automatic fixation selection

IEEE Transactions on Image Processing
Automatic foveation for video compression using a neurobiological model of visual attention

IEEE Transactions on Image Processing
Semantic video analysis for adaptive content delivery and automatic description

IEEE Transactions on Circuits and Systems for Video Technology
Bayesian Integration of Face and Low-Level Cues for Foveated Video Coding

IEEE Transactions on Circuits and Systems for Video Technology

Non-local spatial redundancy reduction for bottom-up saliency estimation

Journal of Visual Communication and Image Representation
Visual saliency detection using information divergence

Pattern Recognition
Recognizing jump patterns with physics-based validation in human moving trajectory

Journal of Visual Communication and Image Representation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes an efficient video coding method using audio-visual focus of attention, which is based on the observation that sound-emitting regions in an audio-visual sequence draw viewers' attention. First, an audio-visual source localization algorithm is presented, where the sound source is identified by using the correlation between the sound signal and the visual motion information. The localization result is then used to encode different regions in the scene with different quality in such a way that regions close to the source are encoded with higher quality than those far from the source. This is implemented in the framework of H.264/AVC by assigning different quantization parameters for different regions. Through experiments with both standard and high definition sequences, it is demonstrated that the proposed method can yield considerable coding gains over the constant quantization mode of H.264/AVC without noticeable degradation of perceived quality.