A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression

Authors:
Chenlei Guo;Liming Zhang
Affiliations:
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh and Department of Electronic Engineering, Fudan University, Shanghai, China;Department of Electronic Engineering, Fudan University, Shanghai, China
Venue:
IEEE Transactions on Image Processing
Year:
2010

Citing 22
Cited 20

Hypercomplex spectral transformations

Hypercomplex spectral transformations
Modeling visual attention via selective tuning

Artificial Intelligence - Special volume on computer vision
Digital image processing

Digital image processing
A Model of Saliency-Based Visual Attention for Rapid Scene Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Algorithms for Defining Visual Regions-of-Interest: Comparison with Eye Fixations

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data- and Model-Driven Gaze Control for an Active-Vision System

IEEE Transactions on Pattern Analysis and Machine Intelligence
Attentional Selection for Object Recognition A Gentle Way

BMCV '02 Proceedings of the Second International Workshop on Biologically Motivated Computer Vision
Object-based visual attention for computer vision

Artificial Intelligence
Automatic Identification of Perceptually Important Regions in an Image

ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 1 - Volume 1
Models of bottom-up and top-down visual attention

Models of bottom-up and top-down visual attention
A Principled Approach to Detecting Surprising Events in Video

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
A Coherent Computational Approach to Model Bottom-Up Visual Attention

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Integrated Model of Top-Down and Bottom-Up Attention for Optimizing Detection Speed

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
2006 Special Issue: Modeling attention to salient proto-objects

Neural Networks
Is bottom-up attention useful for object recognition?

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Efficient implementation of quaternion Fourier transform,convolution, and correlation by 2-D complex FFT

IEEE Transactions on Signal Processing
Foveation scalable video coding with automatic fixation selection

IEEE Transactions on Image Processing
Automatic foveation for video compression using a neurobiological model of visual attention

IEEE Transactions on Image Processing
Hypercomplex Fourier Transforms of Color Images

IEEE Transactions on Image Processing
Fast and Robust Generation of Feature Maps for Region-Based Visual Attention

IEEE Transactions on Image Processing
Low bit-rate coding of image sequences using adaptive regions of interest

IEEE Transactions on Circuits and Systems for Video Technology
Prioritized region of interest coding in JPEG2000

IEEE Transactions on Circuits and Systems for Video Technology

Measuring bitrate and quality trade-off in a fast region-of-interest based video coding

MMM'11 Proceedings of the 17th international conference on Advances in multimedia modeling - Volume Part II
Hebbian-based neural networks for bottom-up visual attention and its applications to ship detection in SAR images

Neurocomputing
A visual saliency map based on random sub-window means

IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
A scheme for attentional video compression

PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
An approach for visual attention based on biquaternion and its application for ship detection in multispectral imagery

Neurocomputing
A dynamic saliency attention model based on local complexity

Digital Signal Processing
A saliency map based on sampling an image into random rectangular regions of interest

Pattern Recognition
How real is real enough? optimal reality sampling for fast recognition of mobile imagery

ACM Transactions on Applied Perception (TAP)
Video saliency detection in the compressed domain

Proceedings of the 20th ACM international conference on Multimedia
Quaternion-Based spectral saliency detection for eye fixation prediction

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Salient object detection: a benchmark

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Arbitrarily shaped virtual-object based video compression

Multimedia Tools and Applications
3D point of interest detection via spectral irregularity diffusion

The Visual Computer: International Journal of Computer Graphics
Stochastic bottom-up fixation prediction and saccade generation

Image and Vision Computing
Attention selection using global topological properties based on pulse coupled neural network

Computer Vision and Image Understanding
Spatiotemporal saliency detection and salient region determination for H.264 videos

Journal of Visual Communication and Image Representation
Visual saliency guided video compression algorithm

Image Communication
Background subtraction using hybrid feature coding in the bag-of-features framework

Pattern Recognition Letters
Saliency detection based on integrated features

Neurocomputing
Top-Down Saliency Detection via Contextual Pooling

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Salient areas in natural scenes are generally regarded as areas which the human eye will typically focus on, and finding these areas is the key step in object detection. In computer vision, many models have been proposed to simulate the behavior of eyes such as SaliencyToolBox (STB), Neuromorphic Vision Toolkit (NVT), and others, but they demand high computational cost and computing useful results mostly relies on their choice of parameters. Although some region-based approaches were proposed to reduce the computational complexity of feature maps, these approaches still were not able to work in real time. Recently, a simple and fast approach called spectral residual (SR) was proposed, which uses the SR of the amplitude spectrum to calculate the image's saliency map. However, in our previous work, we pointed out that it is the phase spectrum, not the amplitude spectrum, of an image's Fourier transform that is key to calculating the location of salient areas, and proposed the phase spectrum of Fourier transform (PFT) model. In this paper, we present a quaternion representation of an image which is composed of intensity, color, and motion features. Based on the principle of PFT, a novel multiresolution spatiotemporal saliency detection model called phase spectrum of quaternion Fourier transform (PQFT) is proposed in this paper to calculate the spatiotemporal saliency map of an image by its quaternion representation. Distinct from other models, the added motion dimension allows the phase spectrum to represent spatiotemporal saliency in order to perform attention selection not only for images but also for videos. In addition, the PQFT model can compute the saliency map of an image under various resolutions from coarse to fine. Therefore, the hierarchical selectivity (HS) framework based on the PQFT model is introduced here to construct the tree structure representation of an image.With the help of HS, a model called multiresolution wavelet domain foveation (MWDF) is proposed in this paper to improve coding efficiency in image and video compression. Extensive tests of videos, natural images, and psychological patterns show that the proposed PQFT model is more effective in saliency detection and can predict eye fixations better than other state-of-the-art models in previous literature. Moreover, our model requires low computational cost and, therefore, can work in real time. Additional experiments on image and video compression show that the HS-MWDF model can achieve higher compression rate than the traditional model.