Robust speech recognition using the modulation spectrogram
Speech Communication - Special issue on robust speech recognition
A robust audio classification and segmentation method
MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
A fusion study in speech / music classification
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Real-time discrimination of broadcast speech/music
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Musical instrument recognition using cepstral coefficients and temporal features
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Speech/non-speech segmentation based on phoneme recognition features
EURASIP Journal on Applied Signal Processing
Selection of optimal features for digital modulation recognition
ICOSSSE'11 Proceedings of the 10th WSEAS international conference on System science and simulation in engineering
Hi-index | 0.00 |
In this paper, we propose Mel-cepstrum modulation energy (MCME) as an extension of modulation energy (ME) for a feature to discriminate speech and music data. MCME is extracted from the time trajectory of Mel-frequency cepstral coefficients (MFCC), while ME is based on the spectrum. As cepstral coefficients are mutually uncorrelated, we expect MCME to perform better than ME. To find out the best modulation frequency for MCME, we make experiments with 4 Hz to 20 Hz modulation frequency, and we compare the results with those obtained from the ME and the MFCC based cepstral flux. In the experiments, 8 Hz MCME shows the best discrimination performance, and it yields a discrimination error reduction rate of 71% compared with 4 Hz ME. Compared with the cepstral flux (CF), it shows an error reduction rate of 53%.