Improving robustness of MLLR adaptation with speaker-clustered regression class trees

  • Authors:
  • Arindam Mandal;Mari Ostendorf;Andreas Stolcke

  • Affiliations:
  • University of Washington, Department of Electrical Engineering, Seattle, WA, USA;University of Washington, Department of Electrical Engineering, Seattle, WA, USA;Speech Technology and Research Laboratory, SRI International, 333 Raveswood Avenue, Menlo Park, CA 94025, USA and International Computer Science Institute, Berkeley, CA, USA

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce a strategy for modeling speaker variability in speaker adaptation based on maximum likelihood linear regression (MLLR). The approach uses a speaker-clustering procedure that models speaker variability by partitioning a large corpus of speakers in the eigenspace of their MLLR transformations and learning cluster-specific regression class tree structures. We present experiments showing that choosing the appropriate regression class tree structure for speakers leads to a significant reduction in overall word error rates in automatic speech recognition systems. To realize these gains in unsupervised adaptation, we describe an algorithm that produces a linear combination of MLLR transformations from cluster-specific trees using weights estimated by maximizing the likelihood of a speaker's adaptation data. This algorithm produces small improvements in overall recognition performance across a range of tasks for both English and Mandarin. More significantly, distributional analysis shows that it reduces the number of speakers with performance loss due to adaptation across a range of adaptation data sizes and word error rates.