Pitch synchronous and glottal closure based speech analysis for language recognition

Authors:
K. Sreenivasa Rao;Sudhamay Maity;V. Ramu Reddy
Affiliations:
School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302;School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302;School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302
Venue:
International Journal of Speech Technology
Year:
2013

Citing 12
Cited 1

Speaker identification and verification using Gaussian mixture speaker models

Speech Communication
Comparison of different implementations of MFCC

Journal of Computer Science and Technology
An unsupervised approach to language identification

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Springer Handbook of Speech Processing

Springer Handbook of Speech Processing
Extraction and representation of prosodic features for language and speaker recognition

Speech Communication
Voice conversion by mapping the speaker-specific features using pitch synchronous approach

Computer Speech and Language
Automatic language identification using Gaussian mixture and hidden Markov models

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Application of prosody models for developing speech systems in Indian languages

International Journal of Speech Technology
Two stage emotion recognition based on speaking rate

International Journal of Speech Technology
Development of syllable-based text to speech synthesis system in Bengali

International Journal of Speech Technology
Discriminatively Trained GMMs for Language Classification Using Boosting Methods

IEEE Transactions on Audio, Speech, and Language Processing
Epoch Extraction From Speech Signals

IEEE Transactions on Audio, Speech, and Language Processing

Identification of Indian languages using multi-level spectral and prosodic features

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper explores pitch synchronous and glottal closure (GC) based spectral features for analyzing the language specific information present in speech. For determining pitch cycles (for pitch synchronous analysis) and GC regions, instants of significant excitation (ISE) are used. The ISE correspond to the instants of glottal closure (epochs) in the case of voiced speech, and some random excitations like onset of burst in the case of nonvoiced speech. For analyzing the language specific information in the proposed features, Indian language speech database (IITKGP-MLILSC) is used. Gaussian mixture models are used to capture the language specific information from the proposed features. Proposed pitch synchronous and glottal closure spectral features are evaluated using language recognition studies. The evaluation results indicate that language recognition performance is better with pitch synchronous and GC based spectral features compared to conventional spectral features derived through block processing. GC based spectral features are found to be more robust against degradations due to background noise. Performance of proposed features is also analyzed on standard Oregon Graduate Institute Multi-Language Telephone-based Speech (OGI-MLTS) database.