Binaural speech separation using recurrent timing neural networks for joint F0-localisation estimation

  • Authors:
  • Stuart N. Wrigley;Guy J. Brown

  • Affiliations:
  • Department of Computer Science, University of Sheffield, Sheffield, United Kingdom;Department of Computer Science, University of Sheffield, Sheffield, United Kingdom

  • Venue:
  • MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

A speech separation system is described in which sources are represented in a joint interaural time difference-fundamental frequency (ITD-F0) cue space. Traditionally, recurrent timing neural networks (RTNNs) have been used only to extract periodicity information; in this study, this type of network is extended in two ways. Firstly, a coincidence detector layer is introduced, each node of which is tuned to a particular ITD; secondly, the RTNN is extended to become two-dimensional to allow periodicity analysis to be performed at each best-ITD. Thus, one axis of the RTNN represents F0 and the other ITD allowing sources to be segregated on the basis of their separation in ITD-F0 space. Source segregation is performed within individual frequency channels without recourse to across-channel estimates of F0 or ITD that are commonly used in auditory scene analysis approaches. The system is evaluated on spatialised speech signals using energy-based metrics and automatic speech recognition.