A speech fragment approach to localising multiple speakers in reverberant environments

Authors:
Heidi Christensen;Ning Ma;Stuart N. Wrigley;Jon Barker
Affiliations:
Department of Computer Science, University of Sheffield, United Kingdom;Department of Computer Science, University of Sheffield, United Kingdom;Department of Computer Science, University of Sheffield, United Kingdom;Department of Computer Science, University of Sheffield, United Kingdom
Venue:
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Year:
2009

Citing 0
Cited 3

Sequential organization of speech in reverberant environments by integrating monaural grouping and binaural localization

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Auditory inspired methods for localization of multiple concurrent speakers

Computer Speech and Language
A hearing-inspired approach for distant-microphone speech recognition in the presence of multiple sources

Computer Speech and Language

Quantified Score

Hi-index	0.01

Visualization

Abstract

Sound source localisation cues are severely degraded when multiple acoustic sources are active in the presence of reverberation. We present a binaural system for localising simultaneous speakers which exploits the fact that in a speech mixture there exist spectro-temporal regions or ‘fragments’, where the energy is dominated by just one of the speakers. A fragment-level localisation model is proposed that integrates the localisation cues within a fragment using a weighted mean. The weights are based on local estimates of the degree of reverberation in a given spectro-temporal cell. The paper investigates different weight estimation approaches based variously on, i) an established model of the perceptual precedence effect; ii) a measure of interaural coherence between the left and right ear signals; iii) a data-driven approach trained in matched acoustic conditions. Experiments with reverberant binaural data with two simultaneous speakers show appropriate weighting can improve frame-based localisation performance by up to 24%.