An on-line speaker adaptation method for HMM-based speech recognizers

Authors:
András Bánhalmi;András Kocsor
Affiliations:
Hungarian Academy of Sciences and University of Szeged, Szeged, Hungary;Hungarian Academy of Sciences and University of Szeged, Szeged, Hungary
Venue:
Acta Cybernetica
Year:
2008

Citing 5
Cited 0

Using the Fisher Kernel Method to Detect Remote Protein Homologies

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Improved Estimation of Supervision in Unsupervised Speaker Adaptation

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Recognition of Conversational Telephone Speech using the Janus Speech Engine

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
Speaker normalization using efficient frequency warping procedures

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
N-best based supervised and unsupervised adaptation for native and non-native speakers in cars

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the past few years numerous techniques have been proposed to improve the efficiency of basic adaptation methods like MLLR and MAP. These adaptation methods have a common aim, which is to increase the likelihood of the phoneme models for a particular speaker. During their operation, these speaker adaptation methods need precise phonetic segmentation information of the actual utterance, but these data samples are often faulty. To improve the overall performance, only those frames from the spoken sentence which are well segmented should be retained, while the incorrectly segmented data should not be used during adaptation. Several heuristic algorithms have been proposed in the literature for the selection of the reliably segmented data blocks, and here we would like to suggest some new heuristics that discriminate between faulty and well-segmented data. The effect of these methods on the efficiency of speech recognition using speaker adaptation is examined, and conclusions for each will be drawn. Besided post-filtering the set of the segmented adaptation examples, another way of improving the efficiency of the adaptation method might be to create a more precise segmentation, which should then reduce the chance of faulty data samples being included. We suggest a method like this here as well which is based on a scoring procedure for the N-best lists, taking into account phoneme duration.