Corpus design for a unit selection TtS system with application to Bulgarian

  • Authors:
  • Aimilios Chalamandaris;Pirros Tsiakoulis;Spyros Raptis;Sotiris Karabetsos

  • Affiliations:
  • Institute for Language and Speech Processing - Athena Research Centre, Athens, Greece;Institute for Language and Speech Processing - Athena Research Centre, Athens, Greece;Institute for Language and Speech Processing - Athena Research Centre, Athens, Greece;Institute for Language and Speech Processing - Athena Research Centre, Athens, Greece

  • Venue:
  • LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present the process of designing an efficient speech corpus for the first unit selection speech synthesis system for Bulgarian, along with some significant preliminary results regarding the quality of the resulted system. As the initial corpus is a crucial factor for the quality delivered by the Text-to-Speech system, special effort has been given in designing a complete and efficient corpus for use in a unit selection TTS system. The targeted domain of the TTS system and hence that of the corpus is the news reports, and although it is a restricted one, it is characterized by an unlimited vocabulary. The paper focuses on issues regarding the design of an optimal corpus for such a framework and the ideas on which our approach was based on. A novel multistage approach is presented, with special attention given to language and speaker dependent issues, as they affect the entire process. The paper concludes with the presentation of our results and the evaluation experiments, which provide clear evidence of the quality level achieved.