Inverting mappings from smooth paths through Rn to paths through Rm: A technique applied to recovering articulation from acoustics

Authors:
John Hogden;Philip Rubin;Erik McDermott;Shigeru Katagiri;Louis Goldstein
Affiliations:
M.S. B265, Los Alamos National Laboratory, Los Alamos, NM 87545, USA;Haskins Laboratories, 300 George Street, Suite 900, New Haven, CT 06511, USA;NTT Communications Science Laboratories, NTT Corporation, Kyoto, Japan;Department of Information System Design, Faculty of Engineering, Doshisha University, 1-3 Tatara Miyakodani, Kyotanabe-shi, Kyoto 610-0394, Japan;Haskins Laboratories, 300 George Street, Suite 900, New Haven, CT 06511, USA
Venue:
Speech Communication
Year:
2007

Citing 17
Cited 3

Estimating articulatory motion from speech wave

Speech Communication - Special issue: Speech research in Japan
Competitive learning algorithms for vector quantization

Neural Networks
Low-dimensional phoneme mapping using a continuity constraint

Low-dimensional phoneme mapping using a continuity constraint
Articulatory-to-acoustic mapping for inverse problem

Speech Communication
Production models as a structural basis for automatic speech recognition

Speech Communication - Special issue on speech production: models and data
A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition

Speech Communication
Quantitative association of vocal-tract and facial behavior

Speech Communication - Special issue on auditory-visual speech processing
The Geometry of Algorithms with Orthogonality Constraints

SIAM Journal on Matrix Analysis and Applications
Linear Prediction of Speech

Linear Prediction of Speech
Physiologically Based Speech Synthesis

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Machine-learning methods for inferring vocal-tract articulation from speech acoustics

Machine-learning methods for inferring vocal-tract articulation from speech acoustics
Visualizing speech with a recurrent neural network trained on human acoustic-articulatory data

Visualizing speech with a recurrent neural network trained on human acoustic-articulatory data
Data-driven production models for speech processing

Data-driven production models for speech processing
Understanding Digital Signal Processing (2nd Edition)

Understanding Digital Signal Processing (2nd Edition)
Production-Oriented Models for Speech Recognition

IEICE - Transactions on Information and Systems
The effects of handset variability on speaker recognition performance: experiments on the Switchboard corpus

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
The HDM: a segmental hidden dynamic model of coarticulation

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01

Editorial note: Bridging the gap between human and automatic speech recognition

Speech Communication
A blind algorithm for recovering articulator positions from acoustics

Asilomar'09 Proceedings of the 43rd Asilomar conference on Signals, systems and computers
Using articulatory likelihoods in the recognition of dysarthric speech

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Motor theories, which postulate that speech perception is related to linguistically significant movements of the vocal tract, have guided speech perception research for nearly four decades but have had little impact on automatic speech recognition. In this paper, we describe a signal processing technique named MIMICRI that may help link motor theory with automatic speech recognition by providing a practical approach to recovering articulator positions from acoustics. MIMICRI's name reflects three important operations it can perform on time-series data: it can reduce the dimensionality of a data set (manifold inference); it can blindly invert nonlinear functions applied to the data (mapping inversion); and it can use temporal context to estimate intermediate data (contextual recovery of information). In order for MIMICRI to work, the signals to be analyzed must be functions of unobservable signals that lie on a linear subspace of the set of all unobservable signals. For example, MIMICRI will typically work if the unobservable signals are band-pass and we know the pass-band, as is the case for articulator motions. We discuss the abilities of MIMICRI as they relate to speech processing applications, particularly as they relate to inverting the mapping from speech articulator positions to acoustics. We then present a mathematical proof that explains why MIMICRI can invert nonlinear functions, which it can do even in some cases in which the mapping from the unobservable variables to the observable variables is many-to-one. Finally, we show that MIMICRI is able to infer accurately the positions of the speech articulators from speech acoustics for vowels. Five parameters estimated by MIMICRI were more linearly related to articulator positions than 128 spectral energies.