To separate speech: a system for recognizing simultaneous speech

Authors:
John McDonough;Kenichi Kumatani;Tobias Gehrig;Emilian Stoimenov;Uwe Mayer;Stefan Schacht;Matthias Wölfel;Dietrich Klakow
Affiliations:
Spoken Language Systems, Saarland University, Saarbrücken, Germany and Institute for Intelligent Sensor-Actuator Systems, University of Karlsruhe, Germany;IDIAP Research Institute, Martigny, Switzerland and Institute for Intelligent Sensor-Actuator Systems, University of Karlsruhe, Germany;Institute for Theoretical Computer Science, University of Karlsruhe, Germany;Institute for Theoretical Computer Science, University of Karlsruhe, Germany;Institute for Theoretical Computer Science, University of Karlsruhe, Germany;Spoken Language Systems, Saarland University, Saarbrücken, Germany;Institute for Theoretical Computer Science, University of Karlsruhe, Germany;Spoken Language Systems, Saarland University, Saarbrücken, Germany
Venue:
MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
Year:
2007

Citing 9
Cited 2

Description and generation of spherically invariant speech-model signals

Signal Processing
Tracking and data association

Tracking and data association
Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Multirate systems and filter banks

Multirate systems and filter banks
Minimization algorithms for sequential transducers

Theoretical Computer Science
Independent component analysis: algorithms and applications

Neural Networks
Discrete Time Processing of Speech Signals

Discrete Time Processing of Speech Signals
Finite-state transducers in language and speech processing

Computational Linguistics
Adaptive Beamforming With a Minimum Mutual Information Criterion

IEEE Transactions on Audio, Speech, and Language Processing

A parallel neural network approach to prediction of Parkinson's Disease

Expert Systems with Applications: An International Journal
The PASCAL CHiME speech separation and recognition challenge

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

The PASCAL Speech Separation Challenge (SSC) is based on a corpus of sentences from the Wall Street Journal task read by two speakers simultaneously and captured with two circular eight-channel microphone arrays. This work describes our system for the recognition of such simultaneous speech. Our system has four principal components: A person tracker returns the locations of both active speakers, as well as segmentation information for each utterance, which are often of unequal length; two beamformers in generalized sidelobe canceller (GSC) configuration separate the simultaneous speech by setting their active weight vectors according to a minimum mutual information (MMI) criterion; a postfilter and binary mask operating on the outputs of the beamformers further enhance the separated speech; and finally an automatic speech recognition (ASR) engine based on a weighted finite-state transducer (WFST) returns the most likely word hypotheses for the separated streams. In addition to optimizing each of these components, we investigated the effect of the filter bank design used to perform subband analysis and synthesis during beamforming. On the SSC development data, our system achieved a word error rate of 39.6%.