Spectral features of plosives in connected-speech signals
International Journal of Man-Machine Studies - Special issue on knowledge-based co-operation
Improving the readability of time-frequency and time-scalerepresentations by the reassignment method
IEEE Transactions on Signal Processing
Multitaper Time-Frequency Reassignment for Nonstationary Spectrum Estimation and Chirp Enhancement
IEEE Transactions on Signal Processing
IEEE Transactions on Information Theory
Hi-index | 0.00 |
We describe an algorithm to automatically estimate the voice onset time (VOT) of plosives. The VOT is the time delay between the burst onset and the start of periodicity when it is followed by a voiced sound. Since the VOT is affected by factors like place of articulation and voicing it can be used for inference of these factors. The algorithm uses the reassignment spectrum of the speech signal, a high resolution time-frequency representation which simplifies the detection of the acoustic events in a plosive. The performance of our algorithm is evaluated on a subset of the TIMIT database by comparison with manual VOT measurements. On average, the difference is smaller than 10ms for 76.1% and smaller than 20ms for 91.4% of the plosive segments. We also provide analysis statistics of the VOT of /b/, /d/, /g/, /p/, /t/ and /k/ and experimentally verify some sources of variability. Finally, to illustrate possible applications, we integrate the automatic VOT estimates as an additional feature in an HMM-based speech recognition system and show a small but statistically significant improvement in phone recognition rate.