Modular construction of time-delay neural networks for speech recognition
Neural Computation
Continuously variable duration hidden Markov models for automatic speech recognition
Computer Speech and Language
Hi-index | 0.00 |
We present a comparison of several architectures of Of Time-Delayed Neural Networks (TDNNs) [1] as the preprocessing step for Hidden Markov Modelling (HMM) speakerdependent continuous-speech recognition systems. We define a modular TDNN architecture on the basis of acousticphonetic knowledge, where each sub-network is trained on a different subset of phonemes. It allows us to define a hierarchical tree structure of sub-networks. This structure offers the possibility to propose a framework to enlarge the number of outputs by defining context-dependent sub-networks. We also compare different methods for integrating TDNNs in a HMM framework, a discrete and a continuous integration. For the speaker JWSA of the speaker-dependent DARPA RM1 database, with context independent phonemes, 21.3% word error rate are obtained without grammar, 4.6% with the DARPA word-pair grammar (perplexity of 60).