Robust features for multilingual acoustic modeling

Authors:
C. Santhosh Kumar;V. P. Mohandas
Affiliations:
ECE Department, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Ettimadai, Coimbatore, India;ECE Department, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Ettimadai, Coimbatore, India
Venue:
International Journal of Speech Technology
Year:
2011

Citing 6
Cited 0

Fundamentals of digital image processing

Fundamentals of digital image processing
Multilingual phone models for vocabulary-independent speech recognition tasks

Speech Communication
Language-independent and language-adaptive acoustic modeling for speech recognition

Speech Communication
In-Service Adaptation of Multilingual Hidden-Markov-Models

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Multilingual Speech Processing

Multilingual Speech Processing
A study on multilingual acoustic modeling for large vocabulary ASR

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a technique to derive robust features for multilingual acoustic modeling using hidden Markov model---Gaussian mixture models (HMM-GMM). We achieve this by discriminatively combining the phonetic contexts of the target languages (languages in the multilingual system). Phonetic context is captured using wide temporal context of the features, and the dimensionality of the resulting feature set is reduced to suit the HMM-GMM implementation using a neural network with a bottle-neck in one of the hidden layers. The output before the non-linearity at the bottle-neck layer of the neural network is the new feature. Since the features are optimized for the target languages in the multilingual recognizer, they are referred to as Target Languages Oriented Features (TLOF).We perform our experiments for two of the most widely spoken Indian languages, Hindi and Tamil. TLOF offers significant performance improvements over both monolingual and multilingual phone recognizers using Mel frequency cepstral coefficients (MFCC). This emphasizes that TLOF can help share data across languages.It was also seen that TLOF can enhance the performance of monolingual acoustic models, compared to systems using MFCC.