Automatic voice onset time estimation from reassignment spectra

  • Authors:
  • Veronique Stouten;Hugo Van hamme

  • Affiliations:
  • ESAT Department, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, P.O. 2441, B-3001 Leuven, Belgium;ESAT Department, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, P.O. 2441, B-3001 Leuven, Belgium

  • Venue:
  • Speech Communication
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe an algorithm to automatically estimate the voice onset time (VOT) of plosives. The VOT is the time delay between the burst onset and the start of periodicity when it is followed by a voiced sound. Since the VOT is affected by factors like place of articulation and voicing it can be used for inference of these factors. The algorithm uses the reassignment spectrum of the speech signal, a high resolution time-frequency representation which simplifies the detection of the acoustic events in a plosive. The performance of our algorithm is evaluated on a subset of the TIMIT database by comparison with manual VOT measurements. On average, the difference is smaller than 10ms for 76.1% and smaller than 20ms for 91.4% of the plosive segments. We also provide analysis statistics of the VOT of /b/, /d/, /g/, /p/, /t/ and /k/ and experimentally verify some sources of variability. Finally, to illustrate possible applications, we integrate the automatic VOT estimates as an additional feature in an HMM-based speech recognition system and show a small but statistically significant improvement in phone recognition rate.