Intonation modeling for Indian languages

Authors:
K. Sreenivasa Rao;B. Yegnanarayana
Affiliations:
School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur 721 302, West Bengal, India;International Institute of Information Technology (IIIT), Gachibowli, Hyderabad 500 032, Andhra Pradesh, India
Venue:
Computer Speech and Language
Year:
2009

Citing 5
Cited 8

Prosody Generation with a Neural Network: Weighing the Importance of Input Parameters

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Prosodic manipulation using instants of significant excitation

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Using a sigmoid transformation for improved modeling of phoneme duration

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Springer Handbook of Speech Processing

Springer Handbook of Speech Processing
Bayesian networks for phone duration prediction

Speech Communication

Determinism in speech pitch relation to emotion

Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Application of prosody models for developing speech systems in Indian languages

International Journal of Speech Technology
Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing

International Journal of Speech Technology
Emotion recognition from speech using source, system, and prosodic features

International Journal of Speech Technology
Characterization and recognition of emotions from speech using excitation source information

International Journal of Speech Technology
Two-stage intonation modeling using feedforward neural networks for syllable based text-to-speech synthesis

Computer Speech and Language
A fuzzy classifier to deal with similarity between labels on automatic prosodic labeling

Computer Speech and Language
Identification of Indian languages using multi-level spectral and prosodic features

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose models for predicting the intonation for the sequence of syllables present in the utterance. The term intonation refers to the temporal changes of the fundamental frequency (F"0). Neural networks are used to capture the implicit intonation knowledge in the sequence of syllables of an utterance. We focus on the development of intonation models for predicting the sequence of fundamental frequency values for a given sequence of syllables. Labeled broadcast news data in the languages Hindi, Telugu and Tamil is used to develop neural network models in order to predict the F"0 of syllables in these languages. The input to the neural network consists of a feature vector representing the positional, contextual and phonological constraints. The interaction between duration and intonation constraints can be exploited for improving the accuracy further. From the studies we find that 88% of the F"0 values (pitch) of the syllables could be predicted from the models within 15% of the actual F"0. The performance of the intonation models is evaluated using objective measures such as average prediction error (@m), standard deviation (@s) and correlation coefficient (@c). The prediction accuracy of the intonation models is further evaluated using listening tests. The prediction performance of the proposed intonation models using neural networks is compared with Classification and Regression Tree (CART) models.