Bridging low-level features and high-level semantics via fMRI brain imaging for video classification

Authors:
Xintao Hu;Fan Deng;Kaiming Li;Tuo Zhang;Hanbo Chen;Xi Jiang;Jinglei Lv;Dajiang Zhu;Carlos Faraco;Degang Zhang;Arsham Mesbah;Junwei Han;Xiansheng Hua;Li Xie;Stephen Miller;Lei Guo;Tianming Liu
Affiliations:
Northwestern Polytechnical University, Xi'an, China;the University of Georgia, Athens, GA, USA;Northwestern Polytechnical University, Xi'an, China;Northwestern Polytechnical University, Xi'an, China;Northwestern Polytechnical University, Xi'an, China;Northwestern Polytechnical University, Xi'an, China;Northwestern Polytechnical University, Xi'an, China;the University of Georgia, Athens, GA, USA;the University of Georgia, Athens, GA, USA;Northwestern Polytechnical University, Xi'an, China;the University of Georgia, Athens, GA, USA;Northwestern Polytechnical University, Xi'an, China;Microsoft Research Asia, Beijing, China;Zhejiang University, Zhejiang, China;the University of Georgia, Athens, GA, USA;Northwestern Polytechnical University, Xi'an, China;the University of Georgia, Athens, GA, USA
Venue:
Proceedings of the international conference on Multimedia
Year:
2010

Citing 12
Cited 3

The JPEG still picture compression standard

Communications of the ACM - Special issue on digital multimedia systems
Support-Vector Networks

Machine Learning
Automatic audio content analysis

MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
Digital Video: An introduction to MPEG-2

Digital Video: An introduction to MPEG-2
Video Content Analysis Using Multimodal Information: For Movie Content Extraction, Indexing and Representation

Video Content Analysis Using Multimodal Information: For Movie Content Extraction, Indexing and Representation
Empirical mode decomposition of field potentials from macaque V4 in visual spatial attention

Biological Cybernetics
Evaluation campaigns and TRECVid

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Feature selection algorithms in classification problems: an experimental evaluation

AIKED'05 Proceedings of the 4th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering Data Bases
Content-Aware Video Transcoding via Visual Attention Model Analysis

IIH-MSP '08 Proceedings of the 2008 International Conference on Intelligent Information Hiding and Multimedia Signal Processing
Brain state decoding for rapid image retrieval

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Perception-oriented video coding based on foveated JND model

PCS'09 Proceedings of the 27th conference on Picture Coding Symposium
A generic framework of user attention model and its application in video summarization

IEEE Transactions on Multimedia

Activated fibers: fiber-centered activation detection in task-based FMRI

IPMI'11 Proceedings of the 22nd international conference on Information processing in medical imaging
Using eye-tracking data for automatic film comic creation

Proceedings of the Symposium on Eye Tracking Research and Applications
Music/speech classification using high-level features derived from fmri brain imaging

Proceedings of the 20th ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

The multimedia content analysis community has made significant effort to bridge the gap between low-level features and high-level semantics perceived by human cognitive systems such as real-world objects and concepts. In the two fields of multimedia analysis and brain imaging, both topics of low-level features and high level semantics are extensively studied. For instance, in the multimedia analysis field, many algorithms are available for multimedia feature extraction, and benchmark datasets are available such as the TRECVID. In the brain imaging field, brain regions that are responsible for vision, auditory perception, language, and working memory are well studied via functional magnetic resonance imaging (fMRI). This paper presents our initial effort in marrying these two fields in order to bridge the gaps between low-level features and high-level semantics via fMRI brain imaging. Our experimental paradigm is that we performed fMRI brain imaging when university student subjects watched the video clips selected from the TRECVID datasets. At current stage, we focus on the three concepts of sports, weather, and commercial-/advertisement specified in the TRECVID 2005. Meanwhile, the brain regions in vision, auditory, language, and working memory networks are quantitatively localized and mapped via task-based paradigm fMRI, and the fMRI responses in these regions are used to extract features as the representation of the brain's comprehension of semantics. Our computational framework aims to learn the most relevant low-level feature sets that best correlate the fMRI-derived semantics based on the training videos with fMRI scans, and then the learned models are applied to larger scale test datasets without fMRI scans for category classifications. Our result shows that: 1) there are meaningful couplings between brain's fMRI responses and video stimuli, suggesting the validity of linking semantics and low-level features via fMRI; 2) The computationally learned low-level feature sets from fMRI-derived semantic features can significantly improve the classification of video categories in comparison with that based on original low-level features.