Prosody modeling with soft templates
Speech Communication
Tree-based state tying for high accuracy acoustic modelling
HLT '94 Proceedings of the workshop on Human Language Technology
A modular holistic approach to prosody modelling for Standard Yorùbá speech synthesis
Computer Speech and Language
Pronunciation prediction with Default&Refine
Computer Speech and Language
Building capacities in human language technology for African languages
AfLaT '09 Proceedings of the First Workshop on Language Technologies for African Languages
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Scikit-learn: Machine Learning in Python
The Journal of Machine Learning Research
Prosody conversion from neutral speech to emotional speech
IEEE Transactions on Audio, Speech, and Language Processing
Automatic speech recognition for under-resourced languages: A survey
Speech Communication
Hi-index | 0.00 |
Pitch is a fundamental acoustic feature of speech and as such needs to be determined during the process of speech synthesis. While a range of communicative functions are attributed to pitch variation in speech of all languages, it plays a vital role in distinguishing meaning of lexical items in tone languages. As a number of factors are assumed to affect the realisation of pitch, it is important to know which mechanisms are systematically responsible for pitch realisation in order to be able to model these effectively and thus develop robust speech synthesis systems in under-resourced environments. To this end, features influencing syllable pitch targets in continuous utterances in Yoruba are investigated in a small speech corpus of 4 speakers. It is found that the previous syllable pitch level is strongly correlated with pitch changes between syllables and a number of approaches and features are evaluated in this context. The resulting models can be used to predict utterance pitch targets for speech synthesisers (whether it be concatenative or statistical parametric systems), and may also prove useful in speech-recognition systems.