Links between Markov models and multilayer perceptrons
Advances in neural information processing systems 1
IEEE Transactions on Pattern Analysis and Machine Intelligence
A second-order HMM for high performance word and phoneme-based continuous speech recognition
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Hierarchical subband linear predictive cepstral (HSLPC) features for HMM-based speech recognition
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Efficient classification for multiclass problems using modular neural networks
IEEE Transactions on Neural Networks
Subspace based feature selection for pattern recognition
Information Sciences: an International Journal
A Sieving ANN for Emotion-Based Movie Clip Classification
IEICE - Transactions on Information and Systems
International Journal of Speech Technology
Hi-index | 0.00 |
We have been investigating for some time the use of a layered modular/ensemble neural network architecture for acoustic modelling. In the particular instantiation investigated so far, this architecture decomposes the task of acoustic modelling by phone. In a first layer at least one multilayer perceptron (or 'primary detector') is trained to discriminate each phone and, in a second layer, outputs from the first layer are combined into posterior probabilities by further MLPs. In this paper we show how our approach provides good acoustic modelling in a series of experiments on the TIMIT speech corpus. Firstly we show that decomposition itself provides a gain through greater precision in MLP training. Secondly we show that primary detectors trained on different front-ends can be profitably combined. Our analysis of the correlations between different detectors for the same phone shows that some independent information is provided by different front-ends. Thirdly we show how to employ information from a wide context within our architectural framework and that this provides performance equivalent to the best context-dependent acoustic modelling systems.