On the importance of modeling temporal information in music tag annotation

Authors:
Jeremy Reed;Chin-Hui Lee
Affiliations:
School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, 30332 USA;School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, 30332 USA
Venue:
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Year:
2009

Citing 0
Cited 1

Multimedia event detection with multimodal feature fusion and temporal concept localization

Machine Vision and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Music is an art form in which sounds are organized in time; however, current approaches for determining similarity and classification largely ignore temporal information. This paper presents an approach to automatic tagging which incorporates temporal aspects of music directly into the statistical models, unlike the typical bag-of-frames paradigm in traditional music information retrieval techniques. Vector quantization on song segments leads to a vocabulary of acoustic segment models. An unsupervised, iterative process that cycles between Viterbi decoding and Baum-Welch estimation builds transcripts of this vocabulary. Latent semantic analysis converts the song transcriptions into a vector for subsequent classification using a support vector machine for each tag. Experimental results demonstrate that the proposed approach performs better in 15 of the 18 tags. Further analysis demonstrates an ability to capture local timbral characteristics as well as sequential arrangements of acoustic segment models.