Expanding the vocabulary of a connectionist recognizer trained on the DARPA Resource Management corpus

  • Authors:
  • H. Lucke;F. Fallside

  • Affiliations:
  • Cambridge University Engineering Department, UK, Cambridge, UK;Cambridge University Engineering Department, UK, Cambridge, UK

  • Venue:
  • ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is shown how the Compositional Representation (CR) [5] previously used for lexical access from sub-word recognizers for a relatively small word vocabulary can be extended to much larger vocabularies without further training. This is demonstrated for the DARPA Resource Management database where, using sub-word units as input, words are represented distributively over a fixed number of units and classified using a simple network. Initially the architecture is trained on 147 words achieving an accuracy 91.2%. Then, leaving the recognizer unchanged, it is shown how additional output units can be added to the network to increase the vocabulary to the complete set of 975 phonetically distinct words. On this extended vocabulary the performance dropped to 66% but this drop is less than the expected drop due to the perplexity increase, Further improvement would be achieved by improving the performance on the original data set.