Acoustic classification and segmentation using modified spectral roll-off and variance-based features

  • Authors:
  • Marko Kos;Zdravko KačIč;Damjan Vlaj

  • Affiliations:
  • University of Maribor, Faculty of Electrical Engineering and Computer Science, Smetanova ul. 17, SI-2000 Maribor, Slovenia;University of Maribor, Faculty of Electrical Engineering and Computer Science, Smetanova ul. 17, SI-2000 Maribor, Slovenia;University of Maribor, Faculty of Electrical Engineering and Computer Science, Smetanova ul. 17, SI-2000 Maribor, Slovenia

  • Venue:
  • Digital Signal Processing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents novel features and an architecture for an automatic on-line acoustic classification and segmentation system. The system includes speech/non-speech segmentation (with the emphasis on accurate speech/music segmentation), gender segmentation, and speech bandwidth segmentation. This automatic segmentation system can be easily integrated into an automatic continuous speech recognition system, where information about individual acoustic segments can be used for acoustic model selection and adaptation, or as additional information for rich transcription output. Acoustic model adaptation can improve the speech recognition rate and additional information in rich transcription can be useful when searching for some certain events or circumstances (male speaker talking over the phone line, etc.). For speech/non-speech segmentation we propose a new set of features, which are based on an energy variance in a narrow frequency sub-band, called EVFB (Energy Variance of Filter Bank). The proposed features also prove to be an efficient discriminator between speech and music. Segmentation cross-test results show that EVFB features prove to be more robust than MFCC features. Two new features (modified spectral roll-off and high-frequency energy variance) are also proposed for speech bandwidth classification and segmentation. The results show a good and robust performance by the automatic on-line acoustic segmentation system. All experiments and tests were performed on a radio broadcast database and a Slovenian BNSI Broadcast News database. Integration of the automatic on-line acoustic segmentation system into a continuous speech recognition system based on MFCC (mel-frequency cepstral coefficients) features requires only a small additional computational cost because many of the proposed system@?s feature calculation procedures are common to the MFCC features calculation procedure.