Modeling durations of syllables using neural networks

Authors:
K. Sreenivasa Rao;B. Yegnanarayana
Affiliations:
Department of Electronics and Communication Engineering, Indian Institute of Technology Guwahati, North Guwahati, Guwahati 781 039, India;Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India
Venue:
Computer Speech and Language
Year:
2007

Citing 7
Cited 14

A model of segmental duration for speech synthesis in French

Speech Communication
Analog I/O nets for syllable timing

Speech Communication - Neurospeech
Characterisation of rhythmic patterns for text-to-speech synthesis

Speech Communication
Multilingual Text-to-Speech Synthesis

Multilingual Text-to-Speech Synthesis
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Prosody Generation with a Neural Network: Weighing the Importance of Input Parameters

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Using a sigmoid transformation for improved modeling of phoneme duration

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01

Neural Network Representation for the Forces and Torque of the Eccentric Sphere Model

Transactions on Computational Science III
Unit Selection Using Linguistic, Prosodic and Spectral Distance for Developing Text-to-Speech System in Hindi

PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Voice conversion by mapping the speaker-specific features using pitch synchronous approach

Computer Speech and Language
Voice transformation by mapping the features at syllable level

PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence
Improving phone duration modelling using support vector regression fusion

Speech Communication
Reorganizing neural network system for two spirals and linear low-density polyethylene copolymer problems

Applied Computational Intelligence and Soft Computing
Application of prosody models for developing speech systems in Indian languages

International Journal of Speech Technology
Two stage emotion recognition based on speaking rate

International Journal of Speech Technology
Acoustic modeling problem for automatic speech recognition system: conventional methods (Part I)

International Journal of Speech Technology
Two-stage phone duration modelling with feature construction and feature vector extension for the needs of speech synthesis

Computer Speech and Language
Filterbank optimization for robust ASR using GA and PSO

International Journal of Speech Technology
Two-stage intonation modeling using feedforward neural networks for syllable based text-to-speech synthesis

Computer Speech and Language
Film segmentation and indexing using autoassociative neural networks

International Journal of Speech Technology
Identification of Indian languages using multi-level spectral and prosodic features

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a neural network model for predicting the durations of syllables. A four layer feedforward neural network trained with backpropagation algorithm is used for modeling the duration knowledge of syllables. Broadcast news data in three Indian languages Hindi, Telugu and Tamil is used for this study. The input to the neural network consists of a set of features extracted from the text. These features correspond to phonological, positional and contextual information. The relative importance of the positional and contextual features is examined separately. For improving the accuracy of prediction, further processing is done on the predicted values of the durations. We also propose a two-stage duration model for improving the accuracy of prediction. From the studies we find that 85% of the syllable durations could be predicted from the models within 25% of the actual duration. The performance of the duration models is evaluated using objective measures such as average prediction error (@m), standard deviation (@s) and correlation coefficient (@c).