Bilingual analysis of song lyrics and audio words

Authors:
Jen-Yu Liu;Chin-Chia Yeh;Yi-Hsuan Yang;Yuan-Ching Teng
Affiliations:
Research Center for IT Innovation, Academia Sinica, Taipei, Taiwan Roc;Research Center for IT Innovation, Academia Sinica, Taipei, Taiwan Roc;Research Center for IT Innovation, Academia Sinica, Taipei, Taiwan Roc;Research Center for IT Innovation, Academia Sinica, Taipei, Taiwan Roc
Venue:
Proceedings of the 20th ACM international conference on Multimedia
Year:
2012

Citing 10
Cited 0

Random Forests

Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Exploring Music Collections by Browsing Different Views

Computer Music Journal
Online dictionary learning for sparse coding

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Towards timbre-invariant audio features for harmony-based music

IEEE Transactions on Audio, Speech, and Language Processing
Music Emotion Recognition

Music Emotion Recognition
Simple and practical algorithm for sparse Fourier transform

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Semantic Annotation and Retrieval of Music and Sound Effects

IEEE Transactions on Audio, Speech, and Language Processing
Supervised dictionary learning for music genre classification

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Thanks to the development of music audio analysis, state-of-the-art techniques can now detect musical attributes such as timbre, rhythm, and pitch with certain level of reliability and effectiveness. An emerging body of research has begun to model the high-level perceptual properties of music listening, including the mood and the preferable listening context of a music piece. Towards this goal, we propose a novel text-like feature representation that encodes the rich and time-varying information of music using a composite of features extracted from the song lyrics and audio signals. In particular, we investigate dictionary learning algorithms to optimize the generation of local feature descriptors and also probabilistic topic models to group semantically relevant text and audio words. This text-like representation leads to significant improvement in automatic mood classification over conventional audio features.