Alaryngeal Speech Enhancement Based on One-to-Many Eigenvoice Conversion

Authors:
Hironori Doi;Tomoki Toda;Keigo Nakamura;Hiroshi Saruwatari;Kiyohiro Shikano
Affiliations:
Grad. Sch. of Inf. Sci., Nara Inst. of Sci. & Technol., Ikoma, Japan;Grad. Sch. of Inf. Sci., Nara Inst. of Sci. & Technol., Ikoma, Japan;Grad. Sch. of Inf. Sci., Nara Inst. of Sci. & Technol., Ikoma, Japan;Grad. Sch. of Inf. Sci., Nara Inst. of Sci. & Technol., Ikoma, Japan;Grad. Sch. of Inf. Sci., Nara Inst. of Sci. & Technol., Ikoma, Japan
Venue:
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Year:
2014

Citing 8
Cited 0

Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Speech Communication
Enhancement and Restoration of Alaryngeal Speech Signals

CONIELECOMP '06 Proceedings of the 16th International Conference on Electronics, Communications and Computers
Non-Audible Murmur (NAM) Recognition

IEICE - Transactions on Information and Systems
Enhancement of esophageal speech using formant synthesis

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model

Speech Communication
Silent-speech enhancement using body-conducted vocal-tract resonance signals

Speech Communication
Prediction of Fundamental Frequency and Voicing From Mel-Frequency Cepstral Coefficients for Unconstrained Speech Reconstruction

IEEE Transactions on Audio, Speech, and Language Processing
Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present novel speaking-aid systems based on one-to-many eigenvoice conversion (EVC) to enhance three types of alaryngeal speech: esophageal speech, electrolaryngeal speech, and body-conducted silent electrolaryngeal speech. Although alaryngeal speech allows laryngectomees to utter speech sounds, it suffers from the lack of speech quality and speaker individuality. To improve the speech quality of alaryngeal speech, alaryngeal-speech-to-speech (AL-to-Speech) methods based on statistical voice conversion have been proposed. In this paper, one-to-many EVC capable of flexibly controlling the converted voice quality by adapting the conversion model to given target natural voices is further implemented for the AL-to-Speech methods to effectively recover speaker individuality of each type of alaryngeal speech. These proposed systems are compared with each other from various perspectives. The experimental results demonstrate that our proposed systems are capable of effectively addressing the issues of alaryngeal speech, e.g., yielding significant improvements in speech quality of each type of alaryngeal speech.