Learning representations for affective video understanding

Authors:
Esra Acar
Affiliations:
DAI Laboratory, Technische Universitat Berlin, Berlin, Germany
Venue:
Proceedings of the 21st ACM international conference on Multimedia
Year:
2013

Citing 11
Cited 0

Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
Hierarchical movie affective content analysis based on arousal and valence features

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Affective Characterization of Movie Scenes Based on Multimedia Content Analysis and User's Physiological Emotional Responses

ISM '08 Proceedings of the 2008 Tenth IEEE International Symposium on Multimedia
Latent topic driving model for movie affective scene classification

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Music video affective understanding using feature importance analysis

Proceedings of the ACM International Conference on Image and Video Retrieval
Emotional identity of movies

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Affective content analysis of music video clips

MIRUM '11 Proceedings of the 1st international ACM workshop on Music information retrieval with user-centered and multimodal strategies
DEAP: A Database for Emotion Analysis ;Using Physiological Signals

IEEE Transactions on Affective Computing
Affective Audio-Visual Words and Latent Topic Driving Model for Realizing Movie Affective Scene Classification

IEEE Transactions on Multimedia
A Large Scale Experiment for Mood-Based Classification of TV Programmes

ICME '12 Proceedings of the 2012 IEEE International Conference on Multimedia and Expo
3D Convolutional Neural Networks for Human Action Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Among the ever growing available multimedia data, finding multimedia content which matches the current mood of users is a challenging problem. Choosing discriminative features for the representation of video segments is a key issue in designing video affective content analysis algorithms, where no dominant feature representation has emerged yet. Most existing affective content analysis methods either use low-level audio-visual features or generate hand-crafted higher level representations. In this work, we propose to use deep learning methods, in particular, convolutional neural networks (CNNs), in order to learn mid-level representations from automatically extracted raw features. We exploit only the audio modality in the current framework and employ Mel-Frequency Cepstral Coefficients (MFCC) features in order to build higher level audio representations. We use the learned representations for the affective classification of music video clips. We choose multi-class support vector machines (SVMs) for classifying video clips into affective categories. Preliminary results on a subset of the DEAP dataset show that a significant improvement is obtained when we learn higher level representations instead of using low-level features directly for video affective content analysis. We plan to further extend this work and include visual modality as well. We will generate mid-level visual representations using CNNs and fuse these visual representations with mid-level audio representations both at feature- and decision-level for video affective content analysis.