Pattern Recognition with Fuzzy Objective Function Algorithms
Pattern Recognition with Fuzzy Objective Function Algorithms
Hierarchical movie affective content analysis based on arousal and valence features
MM '08 Proceedings of the 16th ACM international conference on Multimedia
ISM '08 Proceedings of the 2008 Tenth IEEE International Symposium on Multimedia
Latent topic driving model for movie affective scene classification
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Music video affective understanding using feature importance analysis
Proceedings of the ACM International Conference on Image and Video Retrieval
ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Affective content analysis of music video clips
MIRUM '11 Proceedings of the 1st international ACM workshop on Music information retrieval with user-centered and multimodal strategies
DEAP: A Database for Emotion Analysis ;Using Physiological Signals
IEEE Transactions on Affective Computing
IEEE Transactions on Multimedia
A Large Scale Experiment for Mood-Based Classification of TV Programmes
ICME '12 Proceedings of the 2012 IEEE International Conference on Multimedia and Expo
3D Convolutional Neural Networks for Human Action Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hi-index | 0.00 |
Among the ever growing available multimedia data, finding multimedia content which matches the current mood of users is a challenging problem. Choosing discriminative features for the representation of video segments is a key issue in designing video affective content analysis algorithms, where no dominant feature representation has emerged yet. Most existing affective content analysis methods either use low-level audio-visual features or generate hand-crafted higher level representations. In this work, we propose to use deep learning methods, in particular, convolutional neural networks (CNNs), in order to learn mid-level representations from automatically extracted raw features. We exploit only the audio modality in the current framework and employ Mel-Frequency Cepstral Coefficients (MFCC) features in order to build higher level audio representations. We use the learned representations for the affective classification of music video clips. We choose multi-class support vector machines (SVMs) for classifying video clips into affective categories. Preliminary results on a subset of the DEAP dataset show that a significant improvement is obtained when we learn higher level representations instead of using low-level features directly for video affective content analysis. We plan to further extend this work and include visual modality as well. We will generate mid-level visual representations using CNNs and fuse these visual representations with mid-level audio representations both at feature- and decision-level for video affective content analysis.