Applied multivariate statistical analysis
Applied multivariate statistical analysis
Exploring automatic music annotation with "acoustically-objective" tags
Proceedings of the international conference on Multimedia information retrieval
Modeling music as a dynamic texture
IEEE Transactions on Audio, Speech, and Language Processing
A new approach to cross-modal multimedia retrieval
Proceedings of the international conference on Multimedia
Semantic Annotation and Retrieval of Music and Sound Effects
IEEE Transactions on Audio, Speech, and Language Processing
IntentSearch: Capturing User Intention for One-Click Internet Image Search
IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic music video generation: cross matching of music and image
Proceedings of the 20th ACM international conference on Multimedia
Automatic music video generation: cross matching of music and image
Proceedings of the 20th ACM international conference on Multimedia
Hi-index | 0.00 |
Human perception of music and image are highly correlated. Both of them can inspire human sensation like emotion and power. This paper investigates how to model the relationship between music and image using 47,888 music-image pairs extracted from music videos. We have two basic observations for this relationship: 1) music space exhibits simpler cluster structure than image space, and 2) the relationship between the two spaces is complex and nonlinear. Based on these observations, we develop Multiple Ranking Canonical Correlation Analysis (MR-CCA) to learn such relationship. MR-CCA clusters the music-image pairs according to their music parts, and then conducts Ranking CCA (R-CCA) for each cluster. Compared with classical CCA, R-CCA takes account of the pairwise ranking information available in our dataset. MR-CCA improves performance and significantly reduce computational cost. Experiment results show that R-CCA outperforms CCA, and MR-CCA has the best performance with a consistency score of 84.52% with human labeling. The proposed method can be generalized to model cross media relationship and has potential applications in video generation, background music recommendation, and joint retrieval of music and image.