Auditory nerve representation as a front-end for speech recognition in a noisy environment

  • Authors:
  • Oded Ghitza

  • Affiliations:
  • -

  • Venue:
  • Computer Speech and Language
  • Year:
  • 1986

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe here a computational model based upon the temporal characteristics of the information in the auditory nerve-fiber firing patterns. The model produces a frequency domain representation of the input signal in terms of the ensemble histogram of the inverse of the interspike intervals, measured from firing patterns generated by a simulated nerve-fiber array. The nerve-fiber mechanism is modeled by a multi-level-crossing detector at the output of each cochlear filter. We use 85 cochlear filters, equally spaced on a log-frequency scale from 200 Hz to 3200 Hz, and the level crossings are measured at positive threshold levels which are uniformly distributed in log scale. The resulting Ensemble Interval Histogram (EIH) pseudo spectrum shares two main properties: (1) fine spectral details are well preserved in the low-frequency region but become fuzzy at the high-frequency end; (2) the EIH spectrum is more robust in noise, compared with the traditional Fourier spectrum. This representation of the speech has been used as a front-end to a Dynamic Time Warping (DTW), speaker-dependent, isolated word recognizer. The database consisted of a 39-word alpha-digits vocabulary spoken by two male and two female speakers, in different levels of additive white noise. In the noise-free case, the performance of the EIH-based front-end is comparable to a conventional Fourier Transform (FFT)-based front-end. In the presence of noise, however, the EIH-based front-end is more robust. Compared with the FFT-based front-end, with increasing noise the recognition scores drop more slowly, the resulting gap increases as the SNR values decreases. Quantitatively, with the EIH-based front-end the recognizer achieves a given recognition score with global-SNR values which are between 5 dB and 15 dB lower.