Features extraction for speech emotion

Authors:
Norhaslinda Kamaruddin;Abdul Wahab
Affiliations:
Center for Computational Intelligent, School of Computer Engineering, Nanyang Technological University, Blk N4 &num/}2A-36, Nanyang Avenue, 639798 Singapore;(Correspd. Tel.: +65 6790 4948/ Fax: +65 6792 6559/ E-mail: asabdul@ntu.edu.sg) Center for Computational Intelligent, School of Computer Engineering, Nanyang Technological University, Blk N4 &num/ ...
Venue:
Journal of Computational Methods in Sciences and Engineering
Year:
2009

Citing 9
Cited 4

Fundamentals of speech recognition

Fundamentals of speech recognition
Calculus of fuzzy restrictions

Fuzzy sets, fuzzy logic, and fuzzy systems
Bayesian Approaches to Gaussian Mixture Modeling

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised Learning of Finite Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Primitives-based evaluation and estimation of emotions in speech

Speech Communication
Application of Fuzzy Logic to Approximate Reasoning Using Linguistic Synthesis

IEEE Transactions on Computers
Speaker authentication system using soft computing approaches

Neurocomputing
GenSoFNN: a generic self-organizing fuzzy neural network

IEEE Transactions on Neural Networks

Heterogeneous driver behavior state recognition using speech signal

ICOSSSE'11 Proceedings of the 10th WSEAS international conference on System science and simulation in engineering
Emotion recognition from speech: a review

International Journal of Speech Technology
Emotion recognition from speech using source, system, and prosodic features

International Journal of Speech Technology
Characterization and recognition of emotions from speech using excitation source information

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper the speech emotion verification using two most popular methods in speech processing and analysis based on the Mel-Frequency Cepstral Coefficient (MFCC) and the Gaussian Mixture Model (GMM) were proposed and analyzed. In both cases, features for the speech emotion were extracted using the Short Time Fourier Transform (STFT) and Short Time Histogram (STH) for MFCC and GMM respectively. The performance of the speech emotion verification is measured based on three neural network (NN) and fuzzy neural network (FNN) architectures; namely: Multi Layer Perceptron (MLP), Adaptive Neuro Fuzzy Inference System (ANFIS) and Generic Self-organizing Fuzzy Neural Network (GenSoFNN). Results obtained from the experiments using real audio clips from movies and television sitcoms show the potential of using the proposed features extraction methods for real time application due to its reasonable accuracy and fast training time. This may lead us to the practical usage if the emotion verifier can be embedded in real time applications especially for personal digital assistance (PDA) or smart-phones.