Voice Conversion by Prosody and Vocal Tract Modification

  • Authors:
  • K. Sreenivasa Rao;B. Yegnanarayana

  • Affiliations:
  • Indian Institute of Technology Guwahati;Indian Institute of Technology Madras

  • Venue:
  • ICIT '06 Proceedings of the 9th International Conference on Information Technology
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we proposed some flexible methods, which are useful in the process of voice conversion. The proposed methods modify the shape of the vocal tract system and the characteristics of the prosody according to the desired requirement. The shape of the vocal tract system is modified by shifting the major resonant frequencies (formants) of the short term spectrum, and altering their bandwidths accordingly. In the case of prosody modification, the required durational and intonational characteristics are imposed on the given speech signal. In the proposed method, the prosodic characteristics are manipulated using instants of significant excitation. The instants of significant excitation correspond to the instants of glottal closure (epochs) in the case of voiced speech, and to some random excitations like onset of burst in the case of nonvoiced speech. Instants of significant excitation are computed from the Linear Prediction (LP) residual of the speech signals by using the property of average group delay of minimum phase signals. The manipulations of durational characteristics and pitch contour (intonation pattern) are achieved by manipulating the LP residual with the help of the knowledge of the instants of significant excitation. The modified LP residual is used to excite the time varying filter. The filter parameters are updated according to the desired vocal tract characteristics. The proposed methods are evaluated using listening tests.