Efficient and reliable perceptual weight tuning for unit-selection text-to-speech synthesis based on active interactive genetic algorithms: A proof-of-concept

Authors:
Francesc Alías;Lluís Formiga;Xavier Llorá
Affiliations:
GTM - Grup de Recerca en Tecnologies Mèdia, La Salle - Universitat Ramon Llull, C/Quatre Camins 2, 08022 Barcelona, Spain;GTM - Grup de Recerca en Tecnologies Mèdia, La Salle - Universitat Ramon Llull, C/Quatre Camins 2, 08022 Barcelona, Spain;National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, 1205 W. Clark Street, Urbana, IL 61801, USA
Venue:
Speech Communication
Year:
2011

Citing 7
Cited 1

Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
The Design of Innovation: Lessons from and for Competent Genetic Algorithms

The Design of Innovation: Lessons from and for Competent Genetic Algorithms
Extending Population-Based Incremental Learning to Continuous Search Spaces

PPSN V Proceedings of the 5th International Conference on Parallel Problem Solving from Nature
Corpus-based unit selection for natural-sounding speech synthesis

Corpus-based unit selection for natural-sounding speech synthesis
Combating user fatigue in iGAs: partial ordering, support vector machines, and synthetic fitness

GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation
Multisyn: Open-domain unit selection for the Festival speech synthesis system

Speech Communication
Unit selection in a concatenative speech synthesis system using a large speech database

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01

Optimal weight tuning method for unit selection cost functions in syllable based text-to-speech synthesis

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Unit-selection speech synthesis is one of the current corpus-based text-to-speech synthesis techniques. The quality of the generated speech depends on the accuracy of the unit selection process, which in turn relies on the cost function definition. This function should map the user perceptual preferences when selecting synthesis units, which is still an open research issue. This paper proposes a complete methodology for the tuning of the cost function weights by fusing the human judgments with the cost function, through efficient and reliable interactive weight tuning. To that effect, active interactive genetic algorithms (aiGAs) are used to guide the subjective weight adjustments. The application of aiGAs to this process allows mitigating user fatigue and frustration by improving user consistency. However, it is still unfeasible to subjectively adjust the weights of the whole corpus units (diphones and triphones in this work). This makes it mandatory to perform unit clustering before conducting the tuning process. The aiGA-based weight tuning proposal is evaluated in a small speech corpus as a proof-of-concept and results in more natural synthetic speech when compared to previous objective and subjective-based approaches.