A performance assessment of objective measures for evaluating the quality of glottal waveform estimates

  • Authors:
  • E. Moore;J. Torres

  • Affiliations:
  • Georgia Institute of Technology, School of Electrical and Computer Engineering, 210 Technology Circle, Savannah, GA 31407, USA;Georgia Institute of Technology, School of Electrical and Computer Engineering, 210 Technology Circle, Savannah, GA 31407, USA

  • Venue:
  • Speech Communication
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

Automatic glottal waveform estimation remains a challenging problem in speech analysis. Developing criteria for the objective assessment of the quality of glottal waveform estimates would facilitate the design of more robust estimation algorithms. The aim of this paper is to investigate the performance of potential glottal waveform quality measures (GQM's) and to determine whether a combination of these GQM's is able to consistently provide an accurate assessment of glottal waveform estimate quality across several speakers and phonemes. We develop an experimental setup that produces disjoint sets of high and low-quality glottal waveform estimates from real speech and use this data to objectively assess the performance of 12 glottal waveform quality measures on a sustained vowel speech dataset spanning 16 male speakers and 3 phonemes. In addition, we present a rank-based method (RB-GQA) that allows arbitrary GQM subsets to be effectively combined. Using this method, we perform an exhaustive search on the GQM subset space to determine the best-performing GQM combinations for different groups of speakers and phonemes. While it was found that the optimal GQM combinations are speaker and phoneme dependent, optimization across all utterances (speaker-phoneme pairs) resulted in a combination of 4 GQM's (ratio of first harmonic to maximum harmonic over 0-3.7kHz, group delay variance, phase-plane cycles/period, and phase-plane mean sub-cycle length) that performed very well on almost every utterance in the dataset and nearly matched the performance of the GQM subsets obtained via phoneme-dependent optimization.