Special speech synthesis for social network websites

Authors:
Csaba Zainkó;Tamás Gábor Csapó;Géza Németh
Affiliations:
Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Hungary;Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Hungary;Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Hungary
Venue:
TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Year:
2010

Citing 5
Cited 0

A Simple Spanish Part of Speech Tagger for Detection and Correction of Accentuation Error

TSD '99 Proceedings of the Second International Workshop on Text, Speech and Dialogue
Letter level learning for language independent diacritics restoration

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Spectrum Modification for Emotional Speech Synthesis

Multimodal Signals: Cognitive and Algorithmic Issues
Corpus-Based unit selection TTS for hungarian

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper gives an overview of the design concepts and implementation of a Hungarian microblog reading system. Speech synthesis of such special text requires some special components. First, an efficient diacritic reconstruction algorithm was applied. The accuracy of a former dictionary-based method was improved by machine learning to handle ambiguous cases properly. Second, an unlimited domain text-to-speech synthesizer was applied with extensions for emotional and spontaneous styles. Chat or blog texts often contain "emoticons" which mark the emotional state of the user. Therefore, an expressive speech synthesis method was adapted to a corpus-based synthesizer. Four emotions were generated and evaluated in a listening test: neutral, happy, angry and sad. The results of the experiments showed that happy and sad emotions can be generated with this algorithm, with best accuracy for female voice.