Online speech/music segmentation based on the variance mean of filter bank energy

Authors:
Marko Kos;Matej Grašič,;Zdravko Kačič
Affiliations:
Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia;Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia;Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia
Venue:
EURASIP Journal on Advances in Signal Processing
Year:
2009

Citing 15
Cited 1

Fundamentals of speech recognition

Fundamentals of speech recognition
Classification of general audio data for content-based retrieval

Pattern Recognition Letters - Special issue on image/video indexing and retrieval
Content-Based Audio Classification and Retrieval for Audiovisual Data Parsing

Content-Based Audio Classification and Retrieval for Audiovisual Data Parsing
Audio classification in speech and music: a comparison between a statistical and a neural approach

EURASIP Journal on Applied Signal Processing - Image analysis for multimedia interactive services - part I
Speech/music segmentation using entropy and dynamism features in a HMM classification framework

Speech Communication
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Real-time discrimination of broadcast speech/music

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
A comparison of features for speech, music discrimination

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Speech/music discrimination for multimedia applications

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
Multiple scale music segmentation using rhythm, timbre, and harmony

EURASIP Journal on Applied Signal Processing
Pitch correlogram clustering for fast speaker identification

EURASIP Journal on Applied Signal Processing
Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information

IEEE Transactions on Computers
Detection of speech and music based on spectral tracking

Speech Communication
Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corpora

IEEE Transactions on Audio, Speech, and Language Processing
A speech/music discriminator based on RMS and zero-crossings

IEEE Transactions on Multimedia

Acoustic classification and segmentation using modified spectral roll-off and variance-based features

Digital Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel feature for online speech/music segmentation based on the variance mean of filter bank energy (VMFBE). The idea that encouraged the feature's construction is energy variation in a narrow frequency sub-band. The energy varies more rapidly, and to a greater extent for speech than formusic. Therefore, an energy variance in such a sub-band is greater for speech than for music. The radio broadcast database and the BNSI broadcast news database were used for feature discrimination and segmentation ability evaluation. The calculation procedure of the VMFBE feature has 4 out of 6 steps in common with the MFCC feature calculation procedure. Therefore, it is a very convenient speech/music discriminator for use in real-time automatic speech recognition systems based on MFCC features, because valuable processing time can be saved, and computation load is only slightly increased. Analysis of the feature's speech/music discriminative ability shows an average error rate below 10% for radio broadcast material and it outperforms other features used for comparison, by more than 8%. The proposed feature as a standalone speech/music discriminator in a segmentation system achieves an overall accuracy of over 94% on radio broadcast material.