Deep and Wide: Multiple Layers in Automatic Speech Recognition

Authors:
N. Morgan
Affiliations:
Int. Comput. Sci. Inst., Berkeley, CA, USA
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2012

Citing 0
Cited 2

Recognition of subsampled speech using a modified Mel filter bank

Computers and Electrical Engineering
Advanced classification approach for neuronal phoneme recognition system based on efficient constructive training algorithm

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper reviews a line of research carried out over the last decade in speech recognition assisted by discriminatively trained, feedforward networks. The particular focus is on the use of multiple layers of processing preceding the hidden Markov model based decoding of word sequences. Emphasis is placed on the use of multiple streams of highly dimensioned layers, which have proven useful for this purpose. This paper ultimately concludes that while the deep processing structures can provide improvements for this genre, choice of features and the structure with which they are incorporated, including layer width, can also be significant factors.