A computational model of binaural speech recognition: Role of across-frequency vs. within-frequency processing and internal noise

Authors:
Kalle J. Palomäki;Guy J. Brown
Affiliations:
Aalto University School of Science and Technology, Department of Computer and Information Science, Adaptive Informatics Research Centre, P.O. Box 15400, FI-00076 Aalto, Finland;Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, United Kingdom
Venue:
Speech Communication
Year:
2011

Citing 1
Cited 1

Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication

An improved model of masking effects for robust speech recognition system

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This study describes a model of binaural speech recognition that is tested against psychoacoustic findings on binaural speech intelligibility in noise. It consists of models of the auditory periphery, binaural pathway and recognition of speech from glimpses based on the missing data approach, which allows the speech reception threshold (SRT) of the model and listeners to be compared. The binaural advantage based on differences between the interaural time differences (ITD) of the target and masker is modelled using the equalization-cancellation (EC) mechanism, either independently within each frequency channel or across all channels. The model is tested using a stimulus paradigm in which the target speech and noise interference are split into low- and high-frequency bands, so that the ITD in each band can be varied independently. The match between the model and listener data is quantified by a normalized SRT distance and a correlation metric, which demonstrate a slightly better match for the within-channel model (SRT: 0.5dB, correlation: 0.94), than for the across-channel model (SRT: 0.7dB, correlation: 0.90). However, as the differences between the approaches are small and non-significant, our results suggest that listeners exploit ITD via a mechanism that is neither fully frequency-dependent nor fully frequency-independent.