Experiments in speech recognition using a modular MLP architecture for acoustic modelling

Authors:
T. Jeff Reynolds;Christos A. Antoniou
Affiliations:
Department of Computer Science, University of Essex, Colchester CO4 3SQ, UK;Department of Computer Science, University of Essex, Colchester CO4 3SQ, UK
Venue:
Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Spoken language analysis, modeling and recognition-statistical and adaptive connectionist approaches
Year:
2003

Citing 6
Cited 3

Links between Markov models and multilayer perceptrons

Advances in neural information processing systems 1
Neural Network Ensembles

IEEE Transactions on Pattern Analysis and Machine Intelligence
A second-order HMM for high performance word and phoneme-based continuous speech recognition

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Hierarchical subband linear predictive cepstral (HSLPC) features for HMM-based speech recognition

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Size matters: an empirical study of neural network training for large vocabulary continuous speech recognition

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Efficient classification for multiclass problems using modular neural networks

IEEE Transactions on Neural Networks

Subspace based feature selection for pattern recognition

Information Sciences: an International Journal
A Sieving ANN for Emotion-Based Movie Clip Classification

IEICE - Transactions on Information and Systems
Combining formant frequency based on variable order LPC coding with acoustic features for TIMIT phone recognition

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

We have been investigating for some time the use of a layered modular/ensemble neural network architecture for acoustic modelling. In the particular instantiation investigated so far, this architecture decomposes the task of acoustic modelling by phone. In a first layer at least one multilayer perceptron (or 'primary detector') is trained to discriminate each phone and, in a second layer, outputs from the first layer are combined into posterior probabilities by further MLPs. In this paper we show how our approach provides good acoustic modelling in a series of experiments on the TIMIT speech corpus. Firstly we show that decomposition itself provides a gain through greater precision in MLP training. Secondly we show that primary detectors trained on different front-ends can be profitably combined. Our analysis of the correlations between different detectors for the same phone shows that some independent information is provided by different front-ends. Thirdly we show how to employ information from a wide context within our architectural framework and that this provides performance equivalent to the best context-dependent acoustic modelling systems.