Speaker identification and verification using Gaussian mixture speaker models
Speech Communication
Comparison of different implementations of MFCC
Journal of Computer Science and Technology
An unsupervised approach to language identification
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Springer Handbook of Speech Processing
Springer Handbook of Speech Processing
Voice conversion by mapping the speaker-specific features using pitch synchronous approach
Computer Speech and Language
Automatic language identification using Gaussian mixture and hidden Markov models
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Application of prosody models for developing speech systems in Indian languages
International Journal of Speech Technology
Two stage emotion recognition based on speaking rate
International Journal of Speech Technology
Development of syllable-based text to speech synthesis system in Bengali
International Journal of Speech Technology
Discriminatively Trained GMMs for Language Classification Using Boosting Methods
IEEE Transactions on Audio, Speech, and Language Processing
Epoch Extraction From Speech Signals
IEEE Transactions on Audio, Speech, and Language Processing
Identification of Indian languages using multi-level spectral and prosodic features
International Journal of Speech Technology
Hi-index | 0.00 |
This paper explores pitch synchronous and glottal closure (GC) based spectral features for analyzing the language specific information present in speech. For determining pitch cycles (for pitch synchronous analysis) and GC regions, instants of significant excitation (ISE) are used. The ISE correspond to the instants of glottal closure (epochs) in the case of voiced speech, and some random excitations like onset of burst in the case of nonvoiced speech. For analyzing the language specific information in the proposed features, Indian language speech database (IITKGP-MLILSC) is used. Gaussian mixture models are used to capture the language specific information from the proposed features. Proposed pitch synchronous and glottal closure spectral features are evaluated using language recognition studies. The evaluation results indicate that language recognition performance is better with pitch synchronous and GC based spectral features compared to conventional spectral features derived through block processing. GC based spectral features are found to be more robust against degradations due to background noise. Performance of proposed features is also analyzed on standard Oregon Graduate Institute Multi-Language Telephone-based Speech (OGI-MLTS) database.