Generating ground truth for music mood classification using mechanical turk

  • Authors:
  • Jin Ha Lee;Xiao Hu

  • Affiliations:
  • The Information School, Seattle, WA, USA;University of Denver, Denver, USA

  • Venue:
  • Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mood is an important access point in music digital libraries and online music repositories, but generating ground truth for evaluating various music mood classification algorithms is a challenging problem. This is because collecting enough human judgments is time-consuming and costly due to the subjectivity of music mood. In this study, we explore the viability of crowdsourcing music mood classification judgments using Amazon Mechanical Turk (MTurk). Specifically, we compare the mood classification judgments collected for the annual Music Information Retrieval Evaluation eXchange (MIREX) with judgments collected using MTurk. Our data show that the overall distribution of mood clusters and agreement rates from MIREX and MTurk were comparable. However, Turkers tended to agree less with the pre-labeled mood clusters than MIREX evaluators. The system evaluation results generated using both sets of data were mostly the same except for detecting one statistically significant pair using Friedman's test. We conclude that MTurk can potentially serve as a viable alternative for ground truth collection, with some reservation with regards to particular mood clusters.