The paradigm for creating multi-lingual text-to-speech voice databases

  • Authors:
  • Min Chu;Yong Zhao;Yining Chen;Lijuan Wang;Frank Soong

  • Affiliations:
  • Microsoft Research Asia, Beijing;Microsoft Research Asia, Beijing;Microsoft Research Asia, Beijing;Microsoft Research Asia, Beijing;Microsoft Research Asia, Beijing

  • Venue:
  • ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Voice database is one of the most important parts in TTS systems. However, creating a high quality new TTS voice is not an easy task even for a professional team. The whole process is rather complicated and contains plenty minutiae that should be handled carefully. In fact, in many stages, human interference such as manually checking or labeling is necessary. In multi-lingual situations, it is more challenge to find qualified people to do this kind of interference. That’s why most state-of-the-art TTS systems can provide only a few voices. In this paper, we outline a uniform paradigm for creating multi-lingual TTS voice databases. It focuses on technologies that can either improve the scalability of data collection or reduce human interference such as manually checking or labeling. With this paradigm, we decrease the complexity and work load of the task.