A method for fundamental frequency estimation and voicing decision: Application to infant utterances recorded in real acoustical environments

  • Authors:
  • Tomohiro Nakatani;Shigeaki Amano;Toshio Irino;Kentaro Ishizuka;Tadahisa Kondo

  • Affiliations:
  • NTT Communication Science Labs., NTT Corporation, 2-4, Hikaridai, Seikacho, Sorakugun, Kyoto 619-0237, Japan;NTT Communication Science Labs., NTT Corporation, 2-4, Hikaridai, Seikacho, Sorakugun, Kyoto 619-0237, Japan;Faculty of Systems Engineering, Wakayama University, Japan;NTT Communication Science Labs., NTT Corporation, 2-4, Hikaridai, Seikacho, Sorakugun, Kyoto 619-0237, Japan;NTT Communication Science Labs., NTT Corporation, 2-4, Hikaridai, Seikacho, Sorakugun, Kyoto 619-0237, Japan

  • Venue:
  • Speech Communication
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a method for fundamental frequency (F0) estimation and voicing decision that can handle wide-ranging speech signals including adult and infant utterances recorded in real noisy environments. In particular, infant utterances have unique characteristics that are different from those of adults, such as a wide F0 range, F0 abrupt transitions, and unique energy distribution patterns over frequencies. Therefore, conventional methods that were developed mainly for adult utterances do not necessarily work well for infant utterances especially when the signals are contaminated by background noise. Several techniques are introduced into the proposed method to cope with this problem. We show that the ripple-enhanced power spectrum based method (REPS) can estimate the F0s robustly, and that the use of instantaneous frequency (IF) enables us to refine the accuracy of the F0 estimates. In addition, the degree of dominance defined based on the IF is introduced as a robust voicing decision measure. The effectiveness of the proposed method is confirmed in terms of gross pitch errors and voicing decision errors in comparison with the recently proposed methods, Praat and YIN, using both longitudinal recordings of Japanese infant utterances and adult utterances.