Modelling energy flow in the vocal tract with applications to glottal closure and opening detection
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
IEEE Transactions on Audio, Speech, and Language Processing
A quantitative assessment of group delay methods for identifying glottal closures in voiced speech
IEEE Transactions on Audio, Speech, and Language Processing
Robust glottal source estimation based on joint source-filter model optimization
IEEE Transactions on Audio, Speech, and Language Processing
Investigating acoustic cues in automatic detection of learners' emotion from auto tutor
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
Investigating glottal parameters and teager energy operators in emotion recognition
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
Hi-index | 0.01 |
Automatic glottal waveform estimation remains a challenging problem in speech analysis. Developing criteria for the objective assessment of the quality of glottal waveform estimates would facilitate the design of more robust estimation algorithms. The aim of this paper is to investigate the performance of potential glottal waveform quality measures (GQM's) and to determine whether a combination of these GQM's is able to consistently provide an accurate assessment of glottal waveform estimate quality across several speakers and phonemes. We develop an experimental setup that produces disjoint sets of high and low-quality glottal waveform estimates from real speech and use this data to objectively assess the performance of 12 glottal waveform quality measures on a sustained vowel speech dataset spanning 16 male speakers and 3 phonemes. In addition, we present a rank-based method (RB-GQA) that allows arbitrary GQM subsets to be effectively combined. Using this method, we perform an exhaustive search on the GQM subset space to determine the best-performing GQM combinations for different groups of speakers and phonemes. While it was found that the optimal GQM combinations are speaker and phoneme dependent, optimization across all utterances (speaker-phoneme pairs) resulted in a combination of 4 GQM's (ratio of first harmonic to maximum harmonic over 0-3.7kHz, group delay variance, phase-plane cycles/period, and phase-plane mean sub-cycle length) that performed very well on almost every utterance in the dataset and nearly matched the performance of the GQM subsets obtained via phoneme-dependent optimization.