A hierarchical language identification system for Indian languages

Authors:
S. Jothilakshmi;V. Ramalingam;S. Palanivel
Affiliations:
Department of Computer Science and Engineering, Annamalai University, Annamalainagar 608 002, India;Department of Computer Science and Engineering, Annamalai University, Annamalainagar 608 002, India;Department of Computer Science and Engineering, Annamalai University, Annamalainagar 608 002, India
Venue:
Digital Signal Processing
Year:
2012

Citing 8
Cited 1

Multi-lingual Phoneme Recognition and Language Identification Using Phonotactic Information

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 04
Combining Cepstral and Prosodic Features in Language Identification

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 04
Artificial Neural Networks

Artificial Neural Networks
Language and variety verification on broadcast news for Portuguese

Speech Communication
Analysis and Selection of Prosodic Features for Language Identification

IALP '09 Proceedings of the 2009 International Conference on Asian Language Processing
Automatic language identification using Gaussian mixture and hidden Markov models

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
A Vector Space Modeling Approach to Spoken Language Identification

IEEE Transactions on Audio, Speech, and Language Processing
On Acoustic Diversification Front-End for Spoken Language Identification

IEEE Transactions on Audio, Speech, and Language Processing

Identification of Indian languages using multi-level spectral and prosodic features

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic spoken Language IDentification (LID) is the task of identifying the language from a short duration of speech signal uttered by an unknown speaker. In this work, an attempt has been made to develop a two level language identification system for Indian languages using acoustic features. In the first level, the system identifies the family of the spoken language, and then it is fed to the second level which aims at identifying the particular language in the corresponding family. The performance of the system is analyzed for various acoustic features and different classifiers. The suitable acoustic feature and the pattern classification model are suggested for effective identification of Indian languages. The system has been modeled using hidden Markov model (HMM), Gaussian mixture model (GMM) and artificial neural networks (ANN). We studied the discriminative power of the system for the features mel frequency cepstral coefficients (MFCC), MFCC with delta and acceleration coefficients and shifted delta cepstral (SDC) coefficients. Then the LID performance as a function of the different training and testing set sizes has been studied. To carry out the experiments, a new database has been created for 9 Indian languages. It is shown that GMM based LID system using MFCC with delta and acceleration coefficients is performing well with 80.56% accuracy. The performance of GMM based LID system with SDC is also considerable.