Predicting unseen triphones with senones

Authors:
Mei-Yuh Hwang;Xuedong Huang;Fileno Alleva
Affiliations:
School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania;School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania;School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania
Venue:
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Year:
1993

Citing 5
Cited 1

Automatic discovery of contextual factors describing phonological variation

HLT '89 Proceedings of the workshop on Speech and Natural Language
Improved acoustic modeling for continuous speech recognition

HLT '90 Proceedings of the workshop on Speech and Natural Language
Decision trees for phonological rules in continuous speech

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
CMU robust vocabulary-independent speech recognition system

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Subphonetic modeling with Markov states: senone

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1

Benchmark tests for the DARPA Spoken Language Program

HLT '93 Proceedings of the workshop on Human Language Technology

Quantified Score

Hi-index	0.01

Visualization

Abstract

In large-vocabulary speech recognition, there are always new triphones that are not covered in the training data. These unseen triphones are usually represented by corresponding diphones or context independent monophones. We propose to use decision-tree based senones to generate needed senonic baseforms for unseen triphones. A decision tree is built for each individual Markov state of each phone, and the leaves of the trees constitute the senone codebook. A Markov state of any triphone traverses the corresponding tree until a leaf to find the senone it is to be associated with. We use the DARPA 5,000-word speaker-independent Wall Street Journal dictation task to evaluate the proposed method. The word error rate is reduced by more than 10% when unseen triphones are modeled by the decision-tree based senones.