Improving robustness of MLLR adaptation with speaker-clustered regression class trees

Authors:
Arindam Mandal;Mari Ostendorf;Andreas Stolcke
Affiliations:
University of Washington, Department of Electrical Engineering, Seattle, WA, USA;University of Washington, Department of Electrical Engineering, Seattle, WA, USA;Speech Technology and Research Laboratory, SRI International, 333 Raveswood Avenue, Menlo Park, CA 94025, USA and International Computer Science Institute, Berkeley, CA, USA
Venue:
Computer Speech and Language
Year:
2009

Citing 7
Cited 0

Speaker Adaptive Training: A Maximum Likelihood Approach to Speaker Normalization

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Tree-based state tying for high accuracy acoustic modelling

HLT '94 Proceedings of the workshop on Human Language Technology
Speaker-adaptive HMM-based speech recognition with a stochastic speaker classifier

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Improving environmental robustness in large vocabulary speech recognition

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
An experimental study of acoustic adaptation algorithms

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Correlation modeling of MLLR transform biases for rapid HMM adaptation to new speakers

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Recent innovations in speech-to-text transcription at SRI-ICSI-UW

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a strategy for modeling speaker variability in speaker adaptation based on maximum likelihood linear regression (MLLR). The approach uses a speaker-clustering procedure that models speaker variability by partitioning a large corpus of speakers in the eigenspace of their MLLR transformations and learning cluster-specific regression class tree structures. We present experiments showing that choosing the appropriate regression class tree structure for speakers leads to a significant reduction in overall word error rates in automatic speech recognition systems. To realize these gains in unsupervised adaptation, we describe an algorithm that produces a linear combination of MLLR transformations from cluster-specific trees using weights estimated by maximizing the likelihood of a speaker's adaptation data. This algorithm produces small improvements in overall recognition performance across a range of tasks for both English and Mandarin. More significantly, distributional analysis shows that it reduces the number of speakers with performance loss due to adaptation across a range of adaptation data sizes and word error rates.