Speaker identification and verification using Gaussian mixture speaker models
Speech Communication
An introduction to text-to-speech synthesis
An introduction to text-to-speech synthesis
Prosody-based automatic segmentation of speech into sentences and topics
Speech Communication - Special issue on accessing information in spoken audio
Comparison of different implementations of MFCC
Journal of Computer Science and Technology
Comparing Prosody Across Many Languages
Comparing Prosody Across Many Languages
Modeling durations of syllables using neural networks
Computer Speech and Language
An unsupervised approach to language identification
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Springer Handbook of Speech Processing
Springer Handbook of Speech Processing
Intonation modeling for Indian languages
Computer Speech and Language
Voice conversion by mapping the speaker-specific features using pitch synchronous approach
Computer Speech and Language
Automatic language identification using Gaussian mixture and hidden Markov models
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Application of prosody models for developing speech systems in Indian languages
International Journal of Speech Technology
A hierarchical language identification system for Indian languages
Digital Signal Processing
Discriminatively Trained GMMs for Language Classification Using Boosting Methods
IEEE Transactions on Audio, Speech, and Language Processing
Prosody modification using instants of significant excitation
IEEE Transactions on Audio, Speech, and Language Processing
Epoch Extraction From Speech Signals
IEEE Transactions on Audio, Speech, and Language Processing
Automatic Prosodic Variations Modeling for Language and Dialect Discrimination
IEEE Transactions on Audio, Speech, and Language Processing
Vowel onset point detection for noisy speech using spectral energy at formant frequencies
International Journal of Speech Technology
Pitch synchronous and glottal closure based speech analysis for language recognition
International Journal of Speech Technology
Hi-index | 0.00 |
In this paper spectral and prosodic features extracted from different levels are explored for analyzing the language specific information present in speech. In this work, spectral features extracted from frames of 20 ms (block processing), individual pitch cycles (pitch synchronous analysis) and glottal closure regions are used for discriminating the languages. Prosodic features extracted from syllable, tri-syllable and multi-word (phrase) levels are proposed in addition to spectral features for capturing the language specific information. In this study, language specific prosody is represented by intonation, rhythm and stress features at syllable and tri-syllable (words) levels, whereas temporal variations in fundamental frequency (F 0 contour), durations of syllables and temporal variations in intensities (energy contour) are used to represent the prosody at multi-word (phrase) level. For analyzing the language specific information in the proposed features, Indian language speech database (IITKGP-MLILSC) is used. Gaussian mixture models are used to capture the language specific information from the proposed features. The evaluation results indicate that language identification performance is improved with combination of features. Performance of proposed features is also analyzed on standard Oregon Graduate Institute Multi-Language Telephone-based Speech (OGI-MLTS) database.