Unsupervised tagging of spanish lyrics dataset using clustering

  • Authors:
  • Fabio Leonardo Parra;Elizabeth León

  • Affiliations:
  • MIDAS group, Universidad Nacional de Colombia, Bogotá, D.C., Colombia;MIDAS group, Universidad Nacional de Colombia, Bogotá, D.C., Colombia

  • Venue:
  • MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper an approach for music clustering, using only lyrics features, is developed for identifying groups with similar feelings, content or emotions in the songs. For this study, a collection of 30.000 Spanish lyrics has been used. The songs were represented in a vector space model (Bag Of Words (BOW)), and some techniques of Part Of Speech (POS) were used as part of preprocessing. Partitional and hierarchical methods were used to perform clustering estimating the appropriate number of clusters (k). For evaluating the clustering results, some internal measures were used such as Davies Bouldin Index (DBI), intra similarity and inter similarity measures. At last, the final clusters were tagged using top words and association rules. Experiments show that music could be organized in related groups and tagged using unsupervised techniques as clustering with only lyrics information.