A method for fundamental frequency estimation and voicing decision: Application to infant utterances recorded in real acoustical environments

Authors:
Tomohiro Nakatani;Shigeaki Amano;Toshio Irino;Kentaro Ishizuka;Tadahisa Kondo
Affiliations:
NTT Communication Science Labs., NTT Corporation, 2-4, Hikaridai, Seikacho, Sorakugun, Kyoto 619-0237, Japan;NTT Communication Science Labs., NTT Corporation, 2-4, Hikaridai, Seikacho, Sorakugun, Kyoto 619-0237, Japan;Faculty of Systems Engineering, Wakayama University, Japan;NTT Communication Science Labs., NTT Corporation, 2-4, Hikaridai, Seikacho, Sorakugun, Kyoto 619-0237, Japan;NTT Communication Science Labs., NTT Corporation, 2-4, Hikaridai, Seikacho, Sorakugun, Kyoto 619-0237, Japan
Venue:
Speech Communication
Year:
2008

Citing 1
Cited 3

Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Speech Communication

Robust pitch estimation using a wavelet variance analysis model

Signal Processing
Noise robust F0 determination and epoch-marking algorithms

Signal Processing
Noise robust voice activity detection based on periodic to aperiodic component ratio

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a method for fundamental frequency (F0) estimation and voicing decision that can handle wide-ranging speech signals including adult and infant utterances recorded in real noisy environments. In particular, infant utterances have unique characteristics that are different from those of adults, such as a wide F0 range, F0 abrupt transitions, and unique energy distribution patterns over frequencies. Therefore, conventional methods that were developed mainly for adult utterances do not necessarily work well for infant utterances especially when the signals are contaminated by background noise. Several techniques are introduced into the proposed method to cope with this problem. We show that the ripple-enhanced power spectrum based method (REPS) can estimate the F0s robustly, and that the use of instantaneous frequency (IF) enables us to refine the accuracy of the F0 estimates. In addition, the degree of dominance defined based on the IF is introduced as a robust voicing decision measure. The effectiveness of the proposed method is confirmed in terms of gross pitch errors and voicing decision errors in comparison with the recently proposed methods, Praat and YIN, using both longitudinal recordings of Japanese infant utterances and adult utterances.