Binaural speech separation using recurrent timing neural networks for joint F0-localisation estimation

Authors:
Stuart N. Wrigley;Guy J. Brown
Affiliations:
Department of Computer Science, University of Sheffield, Sheffield, United Kingdom;Department of Computer Science, University of Sheffield, Sheffield, United Kingdom
Venue:
MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
Year:
2007

Citing 5
Cited 0

Modelling auditory processing and organisation

Modelling auditory processing and organisation
Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
Computational Auditory Scene Analysis: Principles, Algorithms, and Applications

Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
Monaural speech segregation based on pitch tracking and amplitude modulation

IEEE Transactions on Neural Networks
A computational model of auditory selective attention

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

A speech separation system is described in which sources are represented in a joint interaural time difference-fundamental frequency (ITD-F0) cue space. Traditionally, recurrent timing neural networks (RTNNs) have been used only to extract periodicity information; in this study, this type of network is extended in two ways. Firstly, a coincidence detector layer is introduced, each node of which is tuned to a particular ITD; secondly, the RTNN is extended to become two-dimensional to allow periodicity analysis to be performed at each best-ITD. Thus, one axis of the RTNN represents F0 and the other ITD allowing sources to be segregated on the basis of their separation in ITD-F0 space. Source segregation is performed within individual frequency channels without recourse to across-channel estimates of F0 or ITD that are commonly used in auditory scene analysis approaches. The system is evaluated on spatialised speech signals using energy-based metrics and automatic speech recognition.