The acousticvisual emotion guassians model for automatic generation of music video

Authors:
Ju-Chiang Wang;Yi-Hsuan Yang;I-Hong Jhuo;Yen-Yu Lin;Hsin-Min Wang
Affiliations:
Academia Sinica, Taipei City, Taiwan Roc;Academia Sinica, Taipei City, Taiwan Roc;National Taiwan University, Taipei City, Taiwan Roc;Academia Sinica, Taipei City, Taiwan Roc;Academia Sinica, Taipei City, Taiwan Roc
Venue:
Proceedings of the 20th ACM international conference on Multimedia
Year:
2012

Citing 11
Cited 1

Multimedia content processing through cross-modal association

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
MusicStory: a personalized music video creator

Proceedings of the 13th annual ACM international conference on Multimedia
Automated music video generation using multi-level feature-based segmentation

Multimedia Tools and Applications
Utilizing affective analysis for efficient movie browsing

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Music Emotion Recognition

Music Emotion Recognition
Action recognition by dense trajectories

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
DEAP: A Database for Emotion Analysis ;Using Physiological Signals

IEEE Transactions on Affective Computing
Affective video content representation and modeling

IEEE Transactions on Multimedia
On the Correlation of Automatic Audio and Visual Segmentations of Music Videos

IEEE Transactions on Circuits and Systems for Video Technology
A Connotative Space for Supporting Movie Affective Recommendation

IEEE Transactions on Multimedia
The acoustic emotion gaussians model for emotion-based music annotation and retrieval

Proceedings of the 20th ACM international conference on Multimedia

Discovering joint audio---visual codewords for video event detection

Machine Vision and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel content-based system that utilizes the perceived emotion of multimedia content as a bridge to connect music and video. Specifically, we propose a novel machine learning framework, called Acousticvisual Emotion Guassians (AVEG), to jointly learn the tripartite relationship among music, video, and emotion from an emotion-annotated corpus of music videos. For a music piece (or a video sequence), the AVEG model is applied to predict its emotion distribution in a stochastic emotion space from the corresponding low-level acoustic (resp. visual) features. Finally, music and video are matched by measuring the similarity between the two corresponding emotion distributions, based on a distance measure such as KL divergence.