A speech fragment approach to localising multiple speakers in reverberant environments

  • Authors:
  • Heidi Christensen;Ning Ma;Stuart N. Wrigley;Jon Barker

  • Affiliations:
  • Department of Computer Science, University of Sheffield, United Kingdom;Department of Computer Science, University of Sheffield, United Kingdom;Department of Computer Science, University of Sheffield, United Kingdom;Department of Computer Science, University of Sheffield, United Kingdom

  • Venue:
  • ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Sound source localisation cues are severely degraded when multiple acoustic sources are active in the presence of reverberation. We present a binaural system for localising simultaneous speakers which exploits the fact that in a speech mixture there exist spectro-temporal regions or ‘fragments’, where the energy is dominated by just one of the speakers. A fragment-level localisation model is proposed that integrates the localisation cues within a fragment using a weighted mean. The weights are based on local estimates of the degree of reverberation in a given spectro-temporal cell. The paper investigates different weight estimation approaches based variously on, i) an established model of the perceptual precedence effect; ii) a measure of interaural coherence between the left and right ear signals; iii) a data-driven approach trained in matched acoustic conditions. Experiments with reverberant binaural data with two simultaneous speakers show appropriate weighting can improve frame-based localisation performance by up to 24%.