Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech

  • Authors:
  • Keigo Nakamura;Tomoki Toda;Hiroshi Saruwatari;Kiyohiro Shikano

  • Affiliations:
  • Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma-shi, Nara 630-0192, Japan;Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma-shi, Nara 630-0192, Japan;Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma-shi, Nara 630-0192, Japan;Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma-shi, Nara 630-0192, Japan

  • Venue:
  • Speech Communication
  • Year:
  • 2012

Quantified Score

Hi-index 0.01

Visualization

Abstract

An electrolarynx (EL) is a medical device that generates sound source signals to provide laryngectomees with a voice. In this article we focus on two problems of speech produced with an EL (EL speech). One problem is that EL speech is extremely unnatural and the other is that sound source signals with high energy are generated by an EL, and therefore, the signals often annoy surrounding people. To address these two problems, in this article we propose three speaking-aid systems that enhance three different types of EL speech signals: EL speech, EL speech using an air-pressure sensor (EL-air speech), and silent EL speech. The air-pressure sensor enables a laryngectomee to manipulate the F"0 contours of EL speech using exhaled air that flows from the tracheostoma. Silent EL speech is produced with a new sound source unit that generates signals with extremely low energy. Our speaking-aid systems address the poor quality of EL speech using voice conversion (VC), which transforms acoustic features so that it appears as if the speech is uttered by another person. Our systems estimate spectral parameters, F"0, and aperiodic components independently. The result of experimental evaluations demonstrates that the use of an air-pressure sensor dramatically improves F"0 estimation accuracy. Moreover, it is revealed that the converted speech signals are preferred to source EL speech.