Learning the meaning of music

Authors:
Brian A. Whitman;Barry L. Vercoe
Affiliations:
Massachusetts Institute of Technology;Massachusetts Institute of Technology
Venue:
Learning the meaning of music
Year:
2005

Citing 0
Cited 7

Towards musical query-by-semantic-description using the CAL500 data set

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A music search engine built upon audio-based and web-based similarity measures

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Music information retrieval using social tags and audio

IEEE Transactions on Multimedia - Special section on communities and media computing
Music retrieval from everything

Proceedings of the international conference on Multimedia information retrieval
Effective music tagging through advanced statistical modeling

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Exploring the music similarity space on the web

ACM Transactions on Information Systems (TOIS)
Semantic annotation of digital music

Journal of Computer and System Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

Expression as complex and personal as music is not adequately represented by the signal alone. We define and model meaning in music as the mapping between the acoustic signal and its contextual interpretation---the 'community metadata' based on popularity, description and personal reaction, collected from reviews, usage, and discussion. In this thesis we present a framework for capturing community metadata from free text sources, audio representations general enough to work across domains of music, and a machine learning framework for learning the relationship between the music signals and the contextual reaction iteratively at a large scale. Our work is evaluated and applied as semantic basis functions ---meaning classifiers that are used to maximize semantic content in a perceptual signal. This process improves upon statistical methods of rank reduction as it aims to model a community's reaction to perception instead of relationships found in the signal alone. We show increased accuracy of common music retrieval tasks with audio projected through semantic basis functions. We also evaluate our models in a 'query-by-description' task for music, where we predict description and community interpretation of audio. These unbiased learning approaches show superior accuracy in music and multimedia intelligence tasks such as similarity, classification and recommendation. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)