Unsupervised tagging of spanish lyrics dataset using clustering

Authors:
Fabio Leonardo Parra;Elizabeth León
Affiliations:
MIDAS group, Universidad Nacional de Colombia, Bogotá, D.C., Colombia;MIDAS group, Universidad Nacional de Colombia, Bogotá, D.C., Colombia
Venue:
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2013

Citing 11
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Integrating Features from Different Sources for Music Information Retrieval

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Multimodal Music Mood Classification Using Audio and Lyrics

ICMLA '08 Proceedings of the 2008 Seventh International Conference on Machine Learning and Applications
A document clustering algorithm for discovering and describing topics

Pattern Recognition Letters
Integration of text and audio features for genre classification in music information retrieval

ECIR'07 Proceedings of the 29th European conference on IR research
Improving mood classification in music digital libraries by combining lyrics and audio

Proceedings of the 10th annual joint conference on Digital libraries
Multi-modal music information retrieval: visualisation and evaluation of clusterings by both audio and lyrics

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Unsupervised music genre classification with a model-based approach

EPIA'11 Proceedings of the 15th Portugese conference on Progress in artificial intelligence
A Survey of Audio-Based Music Classification and Annotation

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper an approach for music clustering, using only lyrics features, is developed for identifying groups with similar feelings, content or emotions in the songs. For this study, a collection of 30.000 Spanish lyrics has been used. The songs were represented in a vector space model (Bag Of Words (BOW)), and some techniques of Part Of Speech (POS) were used as part of preprocessing. Partitional and hierarchical methods were used to perform clustering estimating the appropriate number of clusters (k). For evaluating the clustering results, some internal measures were used such as Davies Bouldin Index (DBI), intra similarity and inter similarity measures. At last, the final clusters were tagged using top words and association rules. Experiments show that music could be organized in related groups and tagged using unsupervised techniques as clustering with only lyrics information.