Experiments in speech recognition using a modular MLP architecture for acoustic modelling

  • Authors:
  • T. Jeff Reynolds;Christos A. Antoniou

  • Affiliations:
  • Department of Computer Science, University of Essex, Colchester CO4 3SQ, UK;Department of Computer Science, University of Essex, Colchester CO4 3SQ, UK

  • Venue:
  • Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Spoken language analysis, modeling and recognition-statistical and adaptive connectionist approaches
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We have been investigating for some time the use of a layered modular/ensemble neural network architecture for acoustic modelling. In the particular instantiation investigated so far, this architecture decomposes the task of acoustic modelling by phone. In a first layer at least one multilayer perceptron (or 'primary detector') is trained to discriminate each phone and, in a second layer, outputs from the first layer are combined into posterior probabilities by further MLPs. In this paper we show how our approach provides good acoustic modelling in a series of experiments on the TIMIT speech corpus. Firstly we show that decomposition itself provides a gain through greater precision in MLP training. Secondly we show that primary detectors trained on different front-ends can be profitably combined. Our analysis of the correlations between different detectors for the same phone shows that some independent information is provided by different front-ends. Thirdly we show how to employ information from a wide context within our architectural framework and that this provides performance equivalent to the best context-dependent acoustic modelling systems.