A Method for Automatic Detection of Vocal Fry

  • Authors:
  • C. T. Ishi;K. -I. Sakakibara;H. Ishiguro;N. Hagita

  • Affiliations:
  • Adv. Telecommun. Res. Inst. Int., Intell. Robot. & Commun. Labs., Kyoto;-;-;-

  • Venue:
  • IEEE Transactions on Audio, Speech, and Language Processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Vocal fry (also called creak, creaky voice, and pulse register phonation) is a voice quality that carries important linguistic or paralinguistic information, depending on the language. We propose a set of acoustic measures and a method for automatically detecting vocal fry segments in speech utterances. A glottal pulse-synchronized method is proposed to deal with the very low fundamental frequency properties of vocal fry segments, which cause problems in the classic short-term analysis methods. The proposed acoustic measures characterize power, aperiodicity, and similarity properties of vocal fry signals. The basic idea of the proposed method is to scan for local power peaks in a ldquovery short-termrdquo power contour for obtaining glottal pulse candidates, check for periodicity properties, and evaluate a similarity measure between neighboring glottal pulse candidates for deciding the possibility of being vocal fry pulses. In the periodicity analysis, autocorrelation peak properties are taken into account for avoiding misdetection of periodicity in vocal fry segments. Evaluation of the proposed acoustic measures in the automatic detection resulted in 74% correct detection, with an insertion error rate of 13%.