Isolated word recognition using high-order statistics and time-delay neural networks

Authors:
M. R. Ashouri
Affiliations:
-
Venue:
SPWHOS '97 Proceedings of the 1997 IEEE Signal Processing Workshop on Higher-Order Statistics (SPW-HOS '97)
Year:
1997

Citing 0
Cited 1

Spatio-Temporal organization map: a speech recognition application

ICANN'05 Proceedings of the 15th international conference on Artificial Neural Networks: biological Inspirations - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: In this paper, two isolated word recognition methods based on high-order statistics and a time-delay neural network (TDNN) for recognition of Farsi spoken digits have been studied. The adopted speech recognition system consists of four modules, namely, a preprocessor, endpoints' detector, feature extractor and classifier. The first method estimates the AR parameters of speech based on the third- and fourth-order cumulants using high-order Yule-Walker, W-slice and 1-D slice approaches. In the second, method, statistical features are extracted from the estimated high-order probability density function (pdf) of thresholded amplitude features. For each pdf estimate, the values of mean, variance, third order moment and entropy are computed. The total number of features for each frame of approximate length of 15 ms is 16. The adopted TDNN has 16 nodes in its input layer, 10 nodes in its output layer and two hidden layers. The learning rule of the adopted TDNN that is based on the backpropagation rule has been modified to decrease the training time. Computer simulation results obtained from recognizing 10 Farsi digits spoken by different speakers shows that the first method has a better recognition rate while the second method necessitates less computation.