Predicting unseen triphones with senones

  • Authors:
  • Mei-Yuh Hwang;Xuedong Huang;Fileno Alleva

  • Affiliations:
  • School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania;School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania;School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania

  • Venue:
  • ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
  • Year:
  • 1993

Quantified Score

Hi-index 0.01

Visualization

Abstract

In large-vocabulary speech recognition, there are always new triphones that are not covered in the training data. These unseen triphones are usually represented by corresponding diphones or context independent monophones. We propose to use decision-tree based senones to generate needed senonic baseforms for unseen triphones. A decision tree is built for each individual Markov state of each phone, and the leaves of the trees constitute the senone codebook. A Markov state of any triphone traverses the corresponding tree until a leaf to find the senone it is to be associated with. We use the DARPA 5,000-word speaker-independent Wall Street Journal dictation task to evaluate the proposed method. The word error rate is reduced by more than 10% when unseen triphones are modeled by the decision-tree based senones.