Speech/music discrimination in audio podcast using structural segmentation and timbre recognition

  • Authors:
  • Mathieu Barthet;Steven Hargreaves;Mark Sandler

  • Affiliations:
  • Centre for Digital Music, Queen Mary University of London, London, United Kingdom;Centre for Digital Music, Queen Mary University of London, London, United Kingdom;Centre for Digital Music, Queen Mary University of London, London, United Kingdom

  • Venue:
  • CMMR'10 Proceedings of the 7th international conference on Exploring music contents
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose two speech/music discrimination methods using timbre models and measure their performances on a 3 hour long database of radio podcasts from the BBC. In the first method, the machine estimated classifications obtained with an automatic timbre recognition (ATR) model are post-processed using median filtering. The classification system (LSF/K-means) was trained using two different taxonomic levels, a high-level one (speech, music), and a lower-level one (male and female speech, classical, jazz, rock & pop). The second method combines automatic structural segmentation and timbre recognition (ASS/ATR). The ASS evaluates the similarity between feature distributions (MFCC, RMS) using HMM and soft K-means algorithms. Both methods were evaluated at a semantic (relative correct overlap RCO), and temporal (boundary retrieval F-measure) levels. The ASS/ATR method obtained the best results (average RCO of 94.5% and boundary F-measure of 50.1%). These performances were favourably compared with that obtained by a SVM-based technique providing a good benchmark of the state of the art.