A performance assessment of objective measures for evaluating the quality of glottal waveform estimates

Authors:
E. Moore;J. Torres
Affiliations:
Georgia Institute of Technology, School of Electrical and Computer Engineering, 210 Technology Circle, Savannah, GA 31407, USA;Georgia Institute of Technology, School of Electrical and Computer Engineering, 210 Technology Circle, Savannah, GA 31407, USA
Venue:
Speech Communication
Year:
2008

Citing 4
Cited 2

Modelling energy flow in the vocal tract with applications to glottal closure and opening detection

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
A new method for obtaining accurate estimates of vocal-tract filters and glottal waves from vowel sounds

IEEE Transactions on Audio, Speech, and Language Processing
A quantitative assessment of group delay methods for identifying glottal closures in voiced speech

IEEE Transactions on Audio, Speech, and Language Processing
Robust glottal source estimation based on joint source-filter model optimization

IEEE Transactions on Audio, Speech, and Language Processing

Investigating acoustic cues in automatic detection of learners' emotion from auto tutor

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
Investigating glottal parameters and teager energy operators in emotion recognition

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II

Quantified Score

Hi-index	0.01

Visualization

Abstract

Automatic glottal waveform estimation remains a challenging problem in speech analysis. Developing criteria for the objective assessment of the quality of glottal waveform estimates would facilitate the design of more robust estimation algorithms. The aim of this paper is to investigate the performance of potential glottal waveform quality measures (GQM's) and to determine whether a combination of these GQM's is able to consistently provide an accurate assessment of glottal waveform estimate quality across several speakers and phonemes. We develop an experimental setup that produces disjoint sets of high and low-quality glottal waveform estimates from real speech and use this data to objectively assess the performance of 12 glottal waveform quality measures on a sustained vowel speech dataset spanning 16 male speakers and 3 phonemes. In addition, we present a rank-based method (RB-GQA) that allows arbitrary GQM subsets to be effectively combined. Using this method, we perform an exhaustive search on the GQM subset space to determine the best-performing GQM combinations for different groups of speakers and phonemes. While it was found that the optimal GQM combinations are speaker and phoneme dependent, optimization across all utterances (speaker-phoneme pairs) resulted in a combination of 4 GQM's (ratio of first harmonic to maximum harmonic over 0-3.7kHz, group delay variance, phase-plane cycles/period, and phase-plane mean sub-cycle length) that performed very well on almost every utterance in the dataset and nearly matched the performance of the GQM subsets obtained via phoneme-dependent optimization.