Phonetic feature extraction for context-sensitive glottal source processing

Authors:
John Kane;Matthew Aylett;Irena Yanushevskaya;Christer Gobl
Affiliations:
-;-;-;-
Venue:
Speech Communication
Year:
2014

Citing 15
Cited 0

Approximation capabilities of multilayer feedforward networks

Neural Networks
Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering

Speech Communication - Eurospeech '91
Parabolic spectral parameter—a new method for quantification of the glottal flow

Speech Communication
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
A study on integrating acoustic-phonetic information into lattice rescoring for automatic speech recognition

Speech Communication
Spoken emotion recognition through optimum-path forest classification using glottal features

Computer Speech and Language
A review of glottal waveform analysis

Progress in nonlinear speech processing
A comparative study of glottal source estimation techniques

Computer Speech and Language
Comparative study: HMM and SVM for automatic articulatory feature extraction

IEA/AIE'06 Proceedings of the 19th international conference on Advances in Applied Artificial Intelligence: industrial, Engineering and Other Applications of Applied Intelligent Systems
Discrimination Power of Vocal Source and Vocal Tract Related Features for Speaker Segmentation

IEEE Transactions on Audio, Speech, and Language Processing
HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering

IEEE Transactions on Audio, Speech, and Language Processing
Evaluation of glottal closure instant detection in a range of voice qualities

Speech Communication
Automating manual user strategies for precise voice source analysis

Speech Communication
Exploiting deep neural networks for detection-based speech recognition

Neurocomputing
Synthesis and perception of breathy, normal, and Lombard speech in the presence of noise

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

The effectiveness of glottal source analysis is known to be dependent on the phonetic properties of its concomitant supraglottal features. Phonetic classes like nasals and fricatives are particularly problematic. Their acoustic characteristics, including zeros in the vocal tract spectrum and aperiodic noise, can have a negative effect on glottal inverse filtering, a necessary pre-requisite to glottal source analysis. In this paper, we first describe and evaluate a set of binary feature extractors, for phonetic classes with relevance for glottal source analysis. As voice quality classification is typically achieved using feature data derived by glottal source analysis, we then investigate the effect of removing data from certain detected phonetic regions on the classification accuracy. For the phonetic feature extraction, classification algorithms based on Artificial Neural Networks (ANNs), Gaussian Mixture Models (GMMs) and Support Vector Machines (SVMs) are compared. Experiments demonstrate that the discriminative classifiers (i.e. ANNs and SVMs) in general give better results compared with the generative learning algorithm (i.e. GMMs). This accuracy generally decreases according to the sparseness of the feature (e.g., accuracy is lower for nasals compared to syllabic regions). We find best classification of voice quality when just using glottal source parameter data derived within detected syllabic regions.