Predicting utterance pitch targets in Yorùbá for tone realisation in speech synthesis

Authors:
Daniel R. Van Niekerk;Etienne Barnard
Affiliations:
Multilingual Speech Technologies, North-West University, Vanderbijlpark, South Africa and Human Language Technologies Research Group, Meraka Institute, CSIR, Pretoria, South Africa;Multilingual Speech Technologies, North-West University, Vanderbijlpark, South Africa
Venue:
Speech Communication
Year:
2014

Citing 8
Cited 1

Prosody modeling with soft templates

Speech Communication
Tree-based state tying for high accuracy acoustic modelling

HLT '94 Proceedings of the workshop on Human Language Technology
A modular holistic approach to prosody modelling for Standard Yorùbá speech synthesis

Computer Speech and Language
Pronunciation prediction with Default&Refine

Computer Speech and Language
Building capacities in human language technology for African languages

AfLaT '09 Proceedings of the First Workshop on Language Technologies for African Languages
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Scikit-learn: Machine Learning in Python

The Journal of Machine Learning Research
Prosody conversion from neutral speech to emotional speech

IEEE Transactions on Audio, Speech, and Language Processing

Automatic speech recognition for under-resourced languages: A survey

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pitch is a fundamental acoustic feature of speech and as such needs to be determined during the process of speech synthesis. While a range of communicative functions are attributed to pitch variation in speech of all languages, it plays a vital role in distinguishing meaning of lexical items in tone languages. As a number of factors are assumed to affect the realisation of pitch, it is important to know which mechanisms are systematically responsible for pitch realisation in order to be able to model these effectively and thus develop robust speech synthesis systems in under-resourced environments. To this end, features influencing syllable pitch targets in continuous utterances in Yoruba are investigated in a small speech corpus of 4 speakers. It is found that the previous syllable pitch level is strongly correlated with pitch changes between syllables and a number of approaches and features are evaluated in this context. The resulting models can be used to predict utterance pitch targets for speech synthesisers (whether it be concatenative or statistical parametric systems), and may also prove useful in speech-recognition systems.