Time and space-efficient architecture for a corpus-based text-to-speech synthesis system

Authors:
Matej Rojc;Zdravko Kačič
Affiliations:
Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova ulica 17, 2000 Maribor, Slovenia;Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova ulica 17, 2000 Maribor, Slovenia
Venue:
Speech Communication
Year:
2007

Citing 10
Cited 4

A corpus-based approach to language learning

A corpus-based approach to language learning
Regular models of phonological rule systems

Computational Linguistics - Special issue on computational phonology
Heterogeneous relation graphs as a formalism for representating linguistic information

Speech Communication - Special issue on speech annotation and corpus tools
The Design and Analysis of Computer Algorithms

The Design and Analysis of Computer Algorithms
Finite-state transducers in language and speech processing

Computational Linguistics
On some applications of finite-state automata theory to natural language processing

Natural Language Engineering
Compilation of weighted finite-state transducers from decision trees

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
An efficient compiler for weighted rewrite rules

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Two-level morphology with composition

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
Joint prosody prediction and unit selection for concatenative speech synthesis

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02

Embodied conversational agents in Wizard-of-Oz and multimodal interaction applications

COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours
Developing multimodal web interfaces by encapsulating their content and functionality within a multimodal shell

COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment
Towards ECA's animation of expressive complex behaviour

COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment
Form-Oriented annotation for building a functionally independent dictionary of synthetic movement

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a time and space-efficient architecture for a text-to-speech synthesis system (TTS). The proposed architecture can be efficiently used in those applications with unlimited domain, requiring multilingual or polyglot functionality. The integration of a queuing mechanism, heterogeneous graphs and finite-state machines gives a powerful, reliable and easily maintainable architecture for the TTS system. Flexible and language-independent framework efficiently integrates all those algorithms used within the scope of the TTS system. Heterogeneous relation graphs are used for linguistic information representation and feature construction. Finite-state machines are used for time and space-efficient representation of language resources, for time and space-efficient lookup processes, and the separation of language-dependent resources from a language-independent TTS engine. Its queuing mechanism consists of several dequeue data structures and is responsible for the activation of all those TTS engine modules having to process the input text. In the proposed architecture, all modules use the same data structure for gathering linguistic information about input text. All input and output formats are compatible, the structure is modular and interchangeable, it is easily maintainable and object oriented. The proposed architecture was successfully used when implementing the Slovenian PLATTOS corpus-based TTS system, as presented in this paper.