Acoustic classification and segmentation using modified spectral roll-off and variance-based features

Authors:
Marko Kos;Zdravko KačIč;Damjan Vlaj
Affiliations:
University of Maribor, Faculty of Electrical Engineering and Computer Science, Smetanova ul. 17, SI-2000 Maribor, Slovenia;University of Maribor, Faculty of Electrical Engineering and Computer Science, Smetanova ul. 17, SI-2000 Maribor, Slovenia;University of Maribor, Faculty of Electrical Engineering and Computer Science, Smetanova ul. 17, SI-2000 Maribor, Slovenia
Venue:
Digital Signal Processing
Year:
2013

Citing 12
Cited 0

Automatic transcription of Broadcast News

Speech Communication - Special issue on automatic transcription of broadcast news data
Content-Based Audio Classification and Retrieval for Audiovisual Data Parsing

Content-Based Audio Classification and Retrieval for Audiovisual Data Parsing
Audio classification in speech and music: a comparison between a statistical and a neural approach

EURASIP Journal on Applied Signal Processing - Image analysis for multimedia interactive services - part I
Speech/music segmentation using entropy and dynamism features in a HMM classification framework

Speech Communication
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Voice-based gender identification in multimedia applications

Journal of Intelligent Information Systems - Special issue: Intelligent multimedia applications
Gender identification using a general audio classifier

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Pitch correlogram clustering for fast speaker identification

EURASIP Journal on Applied Signal Processing
Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information

IEEE Transactions on Computers
Detection of speech and music based on spectral tracking

Speech Communication
Online speech/music segmentation based on the variance mean of filter bank energy

EURASIP Journal on Advances in Signal Processing
A two level strategy for audio segmentation

Digital Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents novel features and an architecture for an automatic on-line acoustic classification and segmentation system. The system includes speech/non-speech segmentation (with the emphasis on accurate speech/music segmentation), gender segmentation, and speech bandwidth segmentation. This automatic segmentation system can be easily integrated into an automatic continuous speech recognition system, where information about individual acoustic segments can be used for acoustic model selection and adaptation, or as additional information for rich transcription output. Acoustic model adaptation can improve the speech recognition rate and additional information in rich transcription can be useful when searching for some certain events or circumstances (male speaker talking over the phone line, etc.). For speech/non-speech segmentation we propose a new set of features, which are based on an energy variance in a narrow frequency sub-band, called EVFB (Energy Variance of Filter Bank). The proposed features also prove to be an efficient discriminator between speech and music. Segmentation cross-test results show that EVFB features prove to be more robust than MFCC features. Two new features (modified spectral roll-off and high-frequency energy variance) are also proposed for speech bandwidth classification and segmentation. The results show a good and robust performance by the automatic on-line acoustic segmentation system. All experiments and tests were performed on a radio broadcast database and a Slovenian BNSI Broadcast News database. Integration of the automatic on-line acoustic segmentation system into a continuous speech recognition system based on MFCC (mel-frequency cepstral coefficients) features requires only a small additional computational cost because many of the proposed system@?s feature calculation procedures are common to the MFCC features calculation procedure.