A large-vocabulary continuous speech recognition system for Hindi

Authors:
M. Kumar;N. Rajput;A. Verma
Affiliations:
IBM Research Division, IBM India Research Laboratory, Block I, Indian Institute of Technology (IIT), Hauz Khas, New Delhi 110016;IBM Research Division, IBM India Research Laboratory, Block I, Indian Institute of Technology (IIT), Hauz Khas, New Delhi 110016;IBM Research Division, IBM India Research Laboratory, Block I, Indian Institute of Technology (IIT), Hauz Khas, New Delhi 110016
Venue:
IBM Journal of Research and Development
Year:
2004

Citing 1
Cited 2

Towards language independent acoustic modeling

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02

Hybrid wavelet based LPC features for Hindi speech recognition

International Journal of Information and Communication Technology
A Hindi speech recognizer for an agricultural video search application

Proceedings of the 3rd ACM Symposium on Computing for Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present two new techniques that have been used to build a large-vocabulary continuous Hindi speech recognition system. We present a technique for fast bootstrapping of initial phone models of a new language. The training data for the new language is aligned using an existing speech recognition engine for another language. This aligned data is used to obtain the initial acoustic models for the phones of the new language. Following this approach requires less training data. We also present a technique for generating baseforms (phonetic spellings) for phonetic languages such as Hindi. As is inherent in phonetic languages, rules generally capture the mapping of spelling to phonemes very well. However, deep linguistic knowledge is required to write all possible rules, and there are some ambiguities in the language that are difficult to capture with rules. On the other hand, pure statistical techniques for base and generation require large amounts of training data that are not readily available. We propose a hybrid approach that combines rule-based and statistical approaches in a two-step fashion. We evaluate the performance of the proposed approaches through various phonetic classification and recognition experiments.