An empirical study on multiple LVCSR model combination by machine learning

Authors:
Takehito Utsuro;Yasuhiro Kodama;Tomohiro Watanabe;Hiromitsu Nishizaki;Seiichi Nakagawa
Affiliations:
Kyoto University, Kyoto, Japan;Sony Corporation;Toyohashi University of Technology;University of Yamanashi;Toyohashi University of Technology
Venue:
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Year:
2004

Citing 2
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Evaluation of segmental unit input HMM

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes to apply machine learning techniques to the task of combining outputs of multiple LVCSR models. The proposed technique has advantages over that by voting schemes such as ROVER, especially when the majority of participating models are not reliable. In this machine learning framework, as features of machine learning, information such as the model IDs which output the hypothesized word are useful for improving the word recognition rate. Experimental results show that the combination results achieve a relative word error reduction of up to 39% against the best performing single model and that of up to 23% against ROVER. We further empirically show that it performs better when LVCSR models to be combined are chosen so as to cover as many correctly recognized words as possible, rather than choosing models in descending order of their word correct rates.