A time-frequency segmental neural network for phoneme recognition

Authors:
Anjan Basu;Torbjørn Svendsen
Affiliations:
Department of Telecommunications, Norwegian Institute of Technology, Trondheim, Norway;Department of Telecommunications, Norwegian Institute of Technology, Trondheim, Norway
Venue:
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: plenary, special, audio, underwater acoustics, VLSI, neural networks - Volume I
Year:
1993

Citing 2
Cited 0

Links Between Markov Models and Multilayer Perceptrons

IEEE Transactions on Pattern Analysis and Machine Intelligence
Speech recognition using segmental neural nets

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a Time-Frequency Segmental Neural Network (TFSNN) which classifies phonemes according to the two-dimensional time frequency distribution of the whole phonetic segment. It uses a network architecture similar to those used for optical character recognition (OCR) [2] in order to provide local shift invariance along both the time and the frequency axis. The TFSNN can be used in place of a segmental neural network (SNN) [1] in a hybrid HMM/ANN system for automatic speech recognition as it shows significantly better performance than the SNN. The training time for the TFSNN is also smaller as it employs very few connection weights compared to the SNN.