The nature of statistical learning theory
The nature of statistical learning theory
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
A comparison of novel techniques for rapid speaker adaptation
Speech Communication
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
The design for the wall street journal-based CSR corpus
HLT '91 Proceedings of the workshop on Speech and Natural Language
Embedded kernel eigenvoice speaker adaptation and its implication to reference speaker weighting
IEEE Transactions on Audio, Speech, and Language Processing
Kernel Eigenspace-Based MLLR Adaptation
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
This paper proposes a nonlinear generalization of the popular maximum-likelihood linear regression (MLLR) adaptation algorithm using kernel methods. The proposed method, called maximum penalized likelihood kernel regression adaptation (MPLKR), applies kernel regression with appropriate regularization to determine the affine model transform in a kernel-induced high-dimensional feature space. Although this is not the first attempt of applying kernel methods to conventional linear adaptation algorithms, unlike most of other kernelized adaptation methods such as kernel eigenvoice or kernel eigen-MLLR, MPLKR has the advantage that it is a convex optimization and its solution is always guaranteed to be globally optimal. In fact, the adapted Gaussian means can be obtained analytically by simply solving a system of linear equations. From the Bayesian perspective, MPLKR can also be considered as the kernel version of maximum a posteriori linear regression (MAPLR) adaptation. Supervised and unsupervised speaker adaptation using MPLKR were evaluated on the Resource Management and Wall Street Journal 5K tasks, respectively, achieving a word error rate reduction of 23.6% and 15.5% respectively over the speaker-independently model.