Time-frequency feature extraction from spectrograms and wavelet packets with application to automatic stress and emotion classification in speech

Authors:
Ling He;Margaret Lech;Namunu C. Maddage;Nicholas B. Allen
Affiliations:
School of Electrical and Computer Engineering, RMIT University, Melbourne, Australia;School of Electrical and Computer Engineering, RMIT University, Melbourne, Australia;School of Electrical and Computer Engineering, RMIT University, Melbourne, Australia;Department of Psychology, The University of Melbourne, Melbourne, Australia
Venue:
ICICS'09 Proceedings of the 7th international conference on Information, communications and signal processing
Year:
2009

Citing 7
Cited 0

Sub-band SNR estimation using auditory feature processing

Speech Communication - Special issue on speech processing for hearing aids
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
Discrete-time speech signal processing: principles and practice

Discrete-time speech signal processing: principles and practice
Facial Expression Recognition Using Neural Networks and Log-Gabor Filters

DICTA '08 Proceedings of the 2008 Digital Image Computing: Techniques and Applications
Stress Detection Using Speech Spectrograms and Sigma-pi Neuron Units

ICNC '09 Proceedings of the 2009 Fifth International Conference on Natural Computation - Volume 02
De-noising by soft-thresholding

IEEE Transactions on Information Theory
Input feature selection for classification problems

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Three new methods of feature extraction based on time-frequency analysis of speech are presented and compared. In the first approach, speech spectrograms were passed through a bank of 12 log-Gabor filters and the outputs are averaged. In the second approach, the spectrograms were sub-divided into ERB frequency bands and the average energy for each band is calculated. In the third approach, wavelet packet arrays were calculated and passed through a bank of 12 log-Gabor filters and averaged. The feature extraction methods were tested in the process of automatic stress and emotion classification. The feature distributions were modeled and classified using a Gaussian mixture model. The test samples included single vowels, words and sentences from the SUSAS data base with 3 classes of stress, and spontaneous speech recordings with 5 emotional classes from the ORI data base. The classification results showed correct classification rates ranging from 64.70% to 84.85%, for different SUSAS data sets and from 39.6% to 53.4% for the ORI data base.