Acoustic modelling for speech recognition in Indian languages in an agricultural commodities task domain

Authors:
Aanchan Mohan;Richard Rose;Sina Hamidi Ghalehjegh;S. Umesh
Affiliations:
-;-;-;-
Venue:
Speech Communication
Year:
2014

Citing 7
Cited 0

Language-independent and language-adaptive acoustic modeling for speech recognition

Speech Communication
Automatic Speech Recognition: The Development of the Sphinx Recognition System

Automatic Speech Recognition: The Development of the Sphinx Recognition System
Multilingual Speech Processing

Multilingual Speech Processing
Automatic clustering and generation of contextual questions for tied states in hidden Markov models

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Speech interfaces for equitable access to information technology

Information Technologies and International Development
Avaaj Otalo: a field study of an interactive voice forum for small farmers in rural India

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
The subspace Gaussian mixture model-A structured model for speech recognition

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

In developing speech recognition based services for any task domain, it is necessary to account for the support of an increasing number of languages over the life of the service. This paper considers a small vocabulary speech recognition task in multiple Indian languages. To configure a multi-lingual system in this task domain, an experimental study is presented using data from two linguistically similar languages - Hindi and Marathi. We do so by training a subspace Gaussian mixture model (SGMM) (Povey et al., 2011; Rose et al., 2011) under a multi-lingual scenario (Burget et al., 2010; Mohan et al., 2012a). Speech data was collected from the targeted user population to develop spoken dialogue systems in an agricultural commodities task domain for this experimental study. It is well known that acoustic, channel and environmental mismatch between data sets from multiple languages is an issue while building multi-lingual systems of this nature. As a result, we use a cross-corpus acoustic normalization procedure which is a variant of speaker adaptive training (SAT) (Mohan et al., 2012a). The resulting multi-lingual system provides the best speech recognition performance for both languages. Further, the effect of sharing ''similar'' context-dependent states from the Marathi language on the Hindi speech recognition performance is presented.