Continuously variable duration hidden Markov models for automatic speech recognition
Computer Speech and Language
Neural nets and hidden Markov models: review and generalizations
Speech Communication - Eurospeech '91
Fundamentals of speech recognition
Fundamentals of speech recognition
Automatic segmentation and labeling of speech based on Hidden Markov Models
Speech Communication
Enhancement, segmentation, and synthesis of speech with application to robust speaker recognition
Enhancement, segmentation, and synthesis of speech with application to robust speaker recognition
Automatic segmentation and labeling of speech
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Improving the intelligibility of dysarthric speech
Speech Communication
Adaptive phoneme alignment based on rough set theory
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Improving articulatory feature and phoneme recognition using multitask learning
ICANN'11 Proceedings of the 21th international conference on Artificial neural networks - Volume Part I
On split Dynamic Time Warping for robust Automatic Dialogue Replacement
Signal Processing
Hi-index | 0.00 |
Determining the location of phonemes is important to a number of speech applications, including training of automatic speech recognition systems, building text-to-speech systems, and research on human speech processing. Agreement of humans on the location of phonemes is, on average, 93.78% within 20ms on a variety of corpora, and 93.49% within 20ms on the TIMIT corpus. We describe a baseline forced-alignment system and a proposed system with several modifications to this baseline. Modifications include the addition of energy-based features to the standard cepstral feature set, the use of probabilities of a state transition given an observation, and the computation of probabilities of distinctive phonetic features instead of phoneme-level probabilities. Performance of the baseline system on the test partition of the TIMIT corpus is 91.48% within 20ms, and performance of the proposed system on this corpus is 93.36% within 20ms. The results of the proposed system are a 22% relative reduction in error over the baseline system, and a 14% reduction in error over results from a non-HMM alignment system. This result of 93.36% agreement is the best known reported result on the TIMIT corpus.