Reliable unseen model prediction for vocabulary-independent speech recognition

Authors:
Sungtak Kim;Hoirin Kim
Affiliations:
School of Engineering, Information & Communications University, Daejeon, Korea;School of Engineering, Information & Communications University, Daejeon, Korea
Venue:
AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Year:
2004

Citing 2
Cited 0

Tree-based state tying for high accuracy acoustic modelling

HLT '94 Proceedings of the workshop on Human Language Technology
Refining tree-based state clustering by means of formal concept analysis, balanced decision trees and automatically generated model-sets

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speech recognition technique is expected to make a great impact on many user interface areas such as toys, mobile phones, PDAs, and home appliances Those applications basically require robust speech recognition immune to environment and channel noises, but the dialogue pattern used in the interaction with the devices may be relatively simple, that is, an isolated-word type The drawback of small-vocabulary isolated-word recognizer which is generally used in the applications is that, if target vocabulary needs to be changed, acoustic models should be retrained for high performance However, if a phone model-based speech recognition is used with reliable unseen model prediction, we do not need to re-train acoustic models in getting higher performance In this paper, we propose a few reliable methods for unseen model prediction in flexible vocabulary speech recognition The first method gives optimal threshold values for stop criteria in decision tree growing, and the second uses an additional condition in the question selection in order to overcome the over-balancing phenomenon in the conventional method The last proposes two-stage decision trees which in the first stage get more properly trained models and in the second stage build more reliable unseen models Various vocabulary-independent situations were examined in order to clearly show the effectiveness of the proposed methods In the experiments, the average word error rates of the proposed methods were reduced by 32.8%, 41.4%, and 44.1% compared to the conventional method, respectively From the results, we can conclude that the proposed methods are very effective in the unseen model prediction for vocabulary- independent speech recognition.