Robust pitch estimation using a wavelet variance analysis model
Signal Processing
Noise robust F0 determination and epoch-marking algorithms
Signal Processing
Noise robust voice activity detection based on periodic to aperiodic component ratio
Speech Communication
Hi-index | 0.00 |
This paper proposes a method for fundamental frequency (F0) estimation and voicing decision that can handle wide-ranging speech signals including adult and infant utterances recorded in real noisy environments. In particular, infant utterances have unique characteristics that are different from those of adults, such as a wide F0 range, F0 abrupt transitions, and unique energy distribution patterns over frequencies. Therefore, conventional methods that were developed mainly for adult utterances do not necessarily work well for infant utterances especially when the signals are contaminated by background noise. Several techniques are introduced into the proposed method to cope with this problem. We show that the ripple-enhanced power spectrum based method (REPS) can estimate the F0s robustly, and that the use of instantaneous frequency (IF) enables us to refine the accuracy of the F0 estimates. In addition, the degree of dominance defined based on the IF is introduced as a robust voicing decision measure. The effectiveness of the proposed method is confirmed in terms of gross pitch errors and voicing decision errors in comparison with the recently proposed methods, Praat and YIN, using both longitudinal recordings of Japanese infant utterances and adult utterances.