Semirings, automata, languages
Semirings, automata, languages
A design principles of a weighted finite-state transducer library
Theoretical Computer Science - Special issue on implementing automata
Automata, Languages, and Machines
Automata, Languages, and Machines
Automata: Theoretic Aspects of Formal Power Series
Automata: Theoretic Aspects of Formal Power Series
Semiring frameworks and algorithms for shortest-distance problems
Journal of Automata, Languages and Combinatorics
Unit selection in a concatenative speech synthesis system using a large speech database
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
A general weighted grammar library
CIAA'04 Proceedings of the 9th international conference on Implementation and Application of Automata
Learning with Weighted Transducers
Proceedings of the 2009 conference on Finite-State Methods and Natural Language Processing: Post-proceedings of the 7th International Workshop FSMNLP 2008
Review: Statistical parametric speech synthesis
Speech Communication
Unit selection using k-nearest neighbor search for concatenative speech synthesis
Proceedings of the 3rd International Universal Communication Symposium
Filters for efficient composition of weighted finite-state transducers
CIAA'10 Proceedings of the 15th international conference on Implementation and application of automata
Hi-index | 0.00 |
Traditional concatenative speech synthesis systems use a number of heuristics to define the target and concatenation costs, essential for the design of the unit selection component. In contrast to these approaches, we introduce a general statistical modeling framework for unit selection inspired by automatic speech recognition. Given appropriate data, techniques based on that framework can result in a more accurate unit selection, thereby improving the general quality of a speech synthesizer. They can also lead to a more modular and a substantially more efficient system.We present a new unit selection system based on statistical modeling. To overcome the original absence of data, we use an existing high-quality unit selection system to generate a corpus of unit sequences. We show that the concatenation cost can be accurately estimated from this corpus using a statistical n-gram language model over units. We used weighted automata and transducers for the representation of the components of the system and designed a new and more efficient composition algorithm making use of string potentials for their combination. The resulting statistical unit selection is shown to be about 2.6 times faster than the last release of the AT&T Natural Voices Product while preserving the same quality, and offers much flexibility for the use and integration of new and more complex components.