Application of prosody models for developing speech systems in Indian languages

Authors:
K. Sreenivasa Rao
Affiliations:
School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302
Venue:
International Journal of Speech Technology
Year:
2011

Citing 14
Cited 6

Fundamentals of speech recognition

Fundamentals of speech recognition
Prosodic aspects of speech

Fundamentals of speech synthesis and speech recognition
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Discrete Time Processing of Speech Signals

Discrete Time Processing of Speech Signals
Mathematical Foundations of Speech and Language Processing

Mathematical Foundations of Speech and Language Processing
Prosodic manipulation using instants of significant excitation

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Combining Cepstral and Prosodic Features in Language Identification

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 04
Artificial Neural Networks

Artificial Neural Networks
Modeling durations of syllables using neural networks

Computer Speech and Language
Springer Handbook of Speech Processing

Springer Handbook of Speech Processing
Extraction and representation of prosodic features for language and speaker recognition

Speech Communication
Intonation modeling for Indian languages

Computer Speech and Language
Vowel Onset Point Detection Using Source, Spectral Peaks, and Modulation Spectrum Energies

IEEE Transactions on Audio, Speech, and Language Processing
Prosody modification using instants of significant excitation

IEEE Transactions on Audio, Speech, and Language Processing

Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing

International Journal of Speech Technology
Integration of multiple acoustic and language models for improved Hindi speech recognition system

International Journal of Speech Technology
Emotion recognition from speech using global and local prosodic features

International Journal of Speech Technology
Characterization and recognition of emotions from speech using excitation source information

International Journal of Speech Technology
Identification of Indian languages using multi-level spectral and prosodic features

International Journal of Speech Technology
Pitch synchronous and glottal closure based speech analysis for language recognition

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we demonstrate the use of prosody models for developing speech systems in Indian languages. Duration and intonation models developed using feedforward neural networks are considered as prosody models. Labelled broadcast news data in the languages Hindi, Telugu, Tamil and Kannada is used for developing the neural network models for predicting the duration and intonation. The features representing the positional, contextual and phonological constraints are used for developing the prosody models. In this paper, the use of prosody models is illustrated using speech recognition, speech synthesis, speaker recognition and language identification applications. Autoassociative neural networks and support vector machines are used as classification models for developing the speech systems. The performance of the speech systems has shown to be improved by combining the prosodic features along with one popular spectral feature set consisting of Weighted Linear Prediction Cepstral Coefficients (WLPCCs).