Beat space segmentation and octave scale cepstral feature for sung language recognition in pop music

  • Authors:
  • Namunu C. Maddage;Haizhou Li

  • Affiliations:
  • Royal Melbourne Institute of Technology University (RMIT), Melbourne, Australia;Institute for Infocomm Research (I2R), Singapore

  • Venue:
  • ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sung language recognition relies on both effective feature extraction and acoustic modeling. In this paper, we study rhythm based music segmentation with the frame size being the duration of the smallest note in the music, as opposed to fixed length segmentation in spoken language recognition. It is found that acoustic features extracted from the rhythm based segmentation scheme outperform those from fixed length segmentation. We also study the effectiveness of a musically motivated acoustic feature. Octave scale cepstral coefficients (OSCCs) by comparing with the other acoustic features: Log frequency cepstral coefficients, Linear prediction coefficients (LPC) and LPC-derived cepstral coefficients. Finally, we examine the modeling capabilities of Gaussian mixture models and support vector machines in sung language recognition experiments. Experiments conducted on a corpus of 400 popular songs sung in English, Chinese, German, and Indonesian, showed that the OSCC feature outperforms other features. A sung language recognition accuracy of 64.9% was achieved when Gaussian mixture models were trained on shifted-delta-OSCC acoustic features, extracted via rhythm based music segmentation.