Incorporating local information of the acoustic environments to MAP-based feature compensation and acoustic model adaptation

  • Authors:
  • Yu Tsao;Xugang Lu;Paul Dixon;Ting-Yao Hu;Shigeki Matsuda;Chiori Hori

  • Affiliations:
  • -;-;-;-;-;-

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

The maximum a posteriori (MAP) criterion is popularly used for feature compensation (FC) and acoustic model adaptation (MA) to reduce the mismatch between training and testing data sets. MAP-based FC and MA require prior densities of mapping function parameters, and designing suitable prior densities plays an important role in obtaining satisfactory performance. In this paper, we propose to use an environment structuring framework to provide suitable prior densities for facilitating MAP-based FC and MA for robust speech recognition. The framework is constructed in a two-stage hierarchical tree structure using environment clustering and partitioning processes. The constructed framework is highly capable of characterizing local information about complex speaker and speaking acoustic conditions. The local information is utilized to specify hyper-parameters in prior densities, which are then used in MAP-based FC and MA to handle the mismatch issue. We evaluated the proposed framework on Aurora-2, a connected digit recognition task, and Aurora-4, a large vocabulary continuous speech recognition (LVCSR) task. On both tasks, experimental results showed that with the prepared environment structuring framework, we could obtain suitable prior densities for enhancing the performance of MAP-based FC and MA.