Unit Selection Using Linguistic, Prosodic and Spectral Distance for Developing Text-to-Speech System in Hindi

  • Authors:
  • K. Sreenivasa Rao;Sudhamay Maity;Amol Taru;Shashidhar G. Koolagudi

  • Affiliations:
  • School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302;School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302;School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302;School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302

  • Venue:
  • PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose a new method for unit selection in developing text-to-speech (TTS) system for Hindi. In the proposed method, syllables are used as basic units for concatenation. Linguistic, positional and contextual features derived from the input text are used at the first level in the unit selection process. The unit selection process is further refined by incorporating the prosodic and spectral characteristics at the utterance and syllable levels. The speech corpora considered for this task is the broadcast Hindi news read by a male speaker. Synthesized speech from the developed TTS system using multi-level unit selection criterion is evaluated using listening tests. From the evaluation results, it is observed that the synthesized speech quality has improved by refining the unit selection process using spectral and prosodic features.