Using articulatory likelihoods in the recognition of dysarthric speech

Authors:
Frank Rudzicz
Affiliations:
University of Toronto, Department of Computer Science Toronto, Canada
Venue:
Speech Communication
Year:
2012

Citing 9
Cited 1

Discriminative speaker adaptation using articulatory features

Speech Communication
Inverting mappings from smooth paths through Rn to paths through Rm: A technique applied to recovering articulation from acoustics

Speech Communication
A functional articulatory dynamic model for speech production

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Comparing speaker-dependent and speaker-adaptive acoustic models for recognizing dysarthric speech

Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility
Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model

Speech Communication
Adaptive kernel canonical correlation analysis algorithms for nonparametric identification of Wiener and Hammerstein systems

EURASIP Journal on Advances in Signal Processing
Modelling errors in automatic speech recognition for dysarthric speakers

EURASIP Journal on Advances in Signal Processing - Special issue on analysis and signal processing of oesophageal and pathological voices
Correcting errors in speech recognition with articulatory dynamics

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Facial expression recognition using kernel canonical correlation analysis (KCCA)

IEEE Transactions on Neural Networks

Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach

Advanced Engineering Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Millions of individuals have congenital or acquired neuro-motor conditions that limit control of their muscles, including those that manipulate the vocal tract. These conditions, collectively called dysarthria, result in speech that is very difficult to understand both by human listeners and by traditional automatic speech recognition (ASR), which in some cases can be rendered completely unusable. In this work we first introduce a new method for acoustic-to-articulatory inversion which estimates positions of the vocal tract given acoustics using a nonlinear Hammerstein system. This is accomplished based on the theory of task-dynamics using the TORGO database of dysarthric articulation. Our approach uses adaptive kernel canonical correlation analysis and is found to be significantly more accurate than mixture density networks, at or above the 95% level of confidence for most vocal tract variables. Next, we introduce a new method for ASR in which acoustic-based hypotheses are re-evaluated according to the likelihoods of their articulatory realizations in task-dynamics. This approach incorporates high-level, long-term aspects of speech production and is found to be significantly more accurate than hidden Markov models, dynamic Bayesian networks, and switching Kalman filters.