Investigating fuzzy-input fuzzy-output support vector machines for robust voice quality classification

Authors:
Stefan Scherer;John Kane;Christer Gobl;Friedhelm Schwenker
Affiliations:
University of Southern California, Institute for Creative Technologies, 90094 Playa Vista, CA, United States and Ulm University, Institute of Neural Information Processing, 89069 Ulm, Germany;Trinity College Dublin, Phonetics and Speech Laboratory, School of Linguistic, Speech and Communication Sciences, Dublin 2, Ireland;Trinity College Dublin, Phonetics and Speech Laboratory, School of Linguistic, Speech and Communication Sciences, Dublin 2, Ireland;Ulm University, Institute of Neural Information Processing, 89069 Ulm, Germany
Venue:
Computer Speech and Language
Year:
2013

Citing 17
Cited 7

Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering

Speech Communication - Eurospeech '91
Acoustic characteristics of voice quality

Speech Communication - Special issue on phonetics and phonology of speaking styles: reduction and elaboration in speech communication
Support vector machines: hype or hallelujah?

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Fuzzy Sets and Systems: Theory and Applications

Fuzzy Sets and Systems: Theory and Applications
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Linear Prediction of Speech

Linear Prediction of Speech
Solving Multi-class Pattern Recognition Problems with Tree-Structured Support Vector Machines

Proceedings of the 23rd DAGM-Symposium on Pattern Recognition
The role of voice quality in communicating emotion, mood and attitude

Speech Communication - Special issue on speech and emotion
Combining Pattern Classifiers: Methods and Algorithms

Combining Pattern Classifiers: Methods and Algorithms
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Comparison of Neural Classification Algorithms Applied to Land Cover Mapping

Proceedings of the 2009 conference on New Directions in Neural Networks: 18th Italian Workshop on Neural Networks: WIRN 2008
Glottal closure instant detection using Lines of Maximum Amplitudes (LOMA) of thewavelet transform

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
A review of glottal waveform analysis

Progress in nonlinear speech processing
Fuzzy-input fuzzy-output one-against-all support vector machines

KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part III
Comparison of multiclass SVM decomposition schemes for visual object recognition

PR'05 Proceedings of the 27th DAGM conference on Pattern Recognition
A Method for Automatic Detection of Vocal Fry

IEEE Transactions on Audio, Speech, and Language Processing
Fuzzy support vector machines

IEEE Transactions on Neural Networks

Step-wise emotion recognition using concatenated-HMM

Proceedings of the 14th ACM international conference on Multimodal interaction
Multimodal prediction of expertise and leadership in learning groups

Proceedings of the 1st International Workshop on Multimodal Learning Analytics
Virtual character performance from speech

Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation
Towards higher quality character performance in previz

Proceedings of the Symposium on Digital Production
Audiovisual behavior descriptors for depression assessment

Proceedings of the 15th ACM on International conference on multimodal interaction
Multi classifier systems and forward backward feature selection algorithms to classify emotional coloured speech

Proceedings of the 15th ACM on International conference on multimodal interaction
Pattern classification and clustering: A review of partially supervised learning approaches

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

The dynamic use of voice qualities in spoken language can reveal useful information on a speakers attitude, mood and affective states. This information may be very desirable for a range of, both input and output, speech technology applications. However, voice quality annotation of speech signals may frequently produce far from consistent labeling. Groups of annotators may disagree on the perceived voice quality, but whom should one trust or is the truth somewhere in between? The current study looks first to describe a voice quality feature set that is suitable for differentiating voice qualities on a tense to breathy dimension. Further, the study looks to include these features as inputs to a fuzzy-input fuzzy-output support vector machine (F^2SVM) algorithm, which is in turn capable of softly categorizing voice quality recordings. The F^2SVM is compared in a thorough analysis to standard crisp approaches and shows promising results, while outperforming for example standard support vector machines with the sole difference being that the F^2SVM approach receives fuzzy label information during training. Overall, it is possible to achieve accuracies of around 90% for both speaker dependent (cross validation) and speaker independent (leave one speaker out validation) experiments. Additionally, the approach using F^2SVM performs at an accuracy of 82% for a cross corpus experiment (i.e. training and testing on entirely different recording conditions) in a frame-wise analysis and of around 97% after temporally integrating over full sentences. Furthermore, the output of fuzzy measures gave performances close to that of human annotators.