Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering
Speech Communication - Eurospeech '91
Acoustic characteristics of voice quality
Speech Communication - Special issue on phonetics and phonology of speaking styles: reduction and elaboration in speech communication
Support vector machines: hype or hallelujah?
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Fuzzy Sets and Systems: Theory and Applications
Fuzzy Sets and Systems: Theory and Applications
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Linear Prediction of Speech
Solving Multi-class Pattern Recognition Problems with Tree-Structured Support Vector Machines
Proceedings of the 23rd DAGM-Symposium on Pattern Recognition
The role of voice quality in communicating emotion, mood and attitude
Speech Communication - Special issue on speech and emotion
Combining Pattern Classifiers: Methods and Algorithms
Combining Pattern Classifiers: Methods and Algorithms
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Comparison of Neural Classification Algorithms Applied to Land Cover Mapping
Proceedings of the 2009 conference on New Directions in Neural Networks: 18th Italian Workshop on Neural Networks: WIRN 2008
Glottal closure instant detection using Lines of Maximum Amplitudes (LOMA) of thewavelet transform
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
A review of glottal waveform analysis
Progress in nonlinear speech processing
Fuzzy-input fuzzy-output one-against-all support vector machines
KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part III
Comparison of multiclass SVM decomposition schemes for visual object recognition
PR'05 Proceedings of the 27th DAGM conference on Pattern Recognition
A Method for Automatic Detection of Vocal Fry
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Neural Networks
Step-wise emotion recognition using concatenated-HMM
Proceedings of the 14th ACM international conference on Multimodal interaction
Multimodal prediction of expertise and leadership in learning groups
Proceedings of the 1st International Workshop on Multimodal Learning Analytics
Virtual character performance from speech
Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation
Towards higher quality character performance in previz
Proceedings of the Symposium on Digital Production
Audiovisual behavior descriptors for depression assessment
Proceedings of the 15th ACM on International conference on multimodal interaction
Proceedings of the 15th ACM on International conference on multimodal interaction
Pattern classification and clustering: A review of partially supervised learning approaches
Pattern Recognition Letters
Hi-index | 0.00 |
The dynamic use of voice qualities in spoken language can reveal useful information on a speakers attitude, mood and affective states. This information may be very desirable for a range of, both input and output, speech technology applications. However, voice quality annotation of speech signals may frequently produce far from consistent labeling. Groups of annotators may disagree on the perceived voice quality, but whom should one trust or is the truth somewhere in between? The current study looks first to describe a voice quality feature set that is suitable for differentiating voice qualities on a tense to breathy dimension. Further, the study looks to include these features as inputs to a fuzzy-input fuzzy-output support vector machine (F^2SVM) algorithm, which is in turn capable of softly categorizing voice quality recordings. The F^2SVM is compared in a thorough analysis to standard crisp approaches and shows promising results, while outperforming for example standard support vector machines with the sole difference being that the F^2SVM approach receives fuzzy label information during training. Overall, it is possible to achieve accuracies of around 90% for both speaker dependent (cross validation) and speaker independent (leave one speaker out validation) experiments. Additionally, the approach using F^2SVM performs at an accuracy of 82% for a cross corpus experiment (i.e. training and testing on entirely different recording conditions) in a frame-wise analysis and of around 97% after temporally integrating over full sentences. Furthermore, the output of fuzzy measures gave performances close to that of human annotators.