Speech/music discrimination using Mel-cepstrum modulation energy

Authors:
Bong-Wan Kim;Dae-Lim Choi;Yong-Ju Lee
Affiliations:
Speech Information Technology and Industry Promotion Center, Wonkwang University, Korea;Speech Information Technology and Industry Promotion Center, Wonkwang University, Korea;Division of Electrical Electronic and Information Engineering, Wonkwang University, Korea
Venue:
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Year:
2007

Citing 8
Cited 1

Robust speech recognition using the modulation spectrogram

Speech Communication - Special issue on robust speech recognition
A robust audio classification and segmentation method

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Speech/music segmentation using entropy and dynamism features in a HMM classification framework

Speech Communication
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
A fusion study in speech / music classification

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Real-time discrimination of broadcast speech/music

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Musical instrument recognition using cepstral coefficients and temporal features

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Speech/non-speech segmentation based on phoneme recognition features

EURASIP Journal on Applied Signal Processing

Selection of optimal features for digital modulation recognition

ICOSSSE'11 Proceedings of the 10th WSEAS international conference on System science and simulation in engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose Mel-cepstrum modulation energy (MCME) as an extension of modulation energy (ME) for a feature to discriminate speech and music data. MCME is extracted from the time trajectory of Mel-frequency cepstral coefficients (MFCC), while ME is based on the spectrum. As cepstral coefficients are mutually uncorrelated, we expect MCME to perform better than ME. To find out the best modulation frequency for MCME, we make experiments with 4 Hz to 20 Hz modulation frequency, and we compare the results with those obtained from the ME and the MFCC based cepstral flux. In the experiments, 8 Hz MCME shows the best discrimination performance, and it yields a discrimination error reduction rate of 71% compared with 4 Hz ME. Compared with the cepstral flux (CF), it shows an error reduction rate of 53%.