Integration of beamforming and uncertainty-of-observation techniques for robust ASR in multi-source environments

  • Authors:
  • RamóN Fernandez Astudillo;Dorothea Kolossa;Alberto Abad;Steffen Zeiler;Rahim Saeidi;Pejman Mowlaee;JoãO Paulo Da Silva Neto;Rainer Martin

  • Affiliations:
  • Spoken Language Systems Lab, INESC-ID, Lisbon, Portugal;Institute of Communication Acoustics, Ruhr-Universität Bochum, Germany;Spoken Language Systems Lab, INESC-ID, Lisbon, Portugal;Institute of Communication Acoustics, Ruhr-Universität Bochum, Germany;Centre for Language and Speech Technology, Radboud University Nijmegen, The Netherlands;Institute of Communication Acoustics, Ruhr-Universität Bochum, Germany;Spoken Language Systems Lab, INESC-ID, Lisbon, Portugal;Institute of Communication Acoustics, Ruhr-Universität Bochum, Germany

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a new approach for increasing the robustness of multi-channel automatic speech recognition in noisy and reverberant multi-source environments. The proposed method uses uncertainty propagation techniques to dynamically compensate the speech features and the acoustic models for the observation uncertainty determined at the beamforming stage. We present and analyze two methods that allow integrating classical multi-channel signal processing approaches like delay and sum beamformers or Zelinski-type Wiener filters, with uncertainty-of-observation techniques like uncertainty decoding or modified imputation. An analysis of the results on the PASCAL-CHiME task shows that this approach consistently outperforms conventional beamformers with a minimal increase in computational complexity. The use of dynamic compensation based on observation uncertainty also outperforms conventional static adaptation with no need of adaptation data.