Exploiting nonacoustic sensors for speech encoding

Authors:
T. F. Quatieri;K. Brady;D. Messing;J. P. Campbell;W. M. Campbell;M. S. Brandstein;C. J. Weinstein;J. D. Tardelli;P. D. Gatewood
Affiliations:
MIT Lincoln Lab., Lexington, MA, USA;-;-;-;-;-;-;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2006

Citing 0
Cited 5

Towards a New Image-Based Spectrogram Segmentation Speech Coder Optimised for Intelligibility

MMM '09 Proceedings of the 15th International Multimedia Modeling Conference on Advances in Multimedia Modeling
Improving throat microphone speech recognition by joint analysis of throat and acoustic microphone recordings

IEEE Transactions on Audio, Speech, and Language Processing
Silent speech interfaces

Speech Communication
The physiological microphone (PMIC): A competitive alternative for speaker assessment in stress detection and speaker verification

Speech Communication
ICCHP keynote: recognizing silent and weak speech based on electromyography

ICCHP'10 Proceedings of the 12th international conference on Computers helping people with special needs: Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The intelligibility of speech transmitted through low-rate coders is severely degraded when high levels of acoustic noise are present in the acoustic environment. Recent advances in nonacoustic sensors, including microwave radar, skin vibration, and bone conduction sensors, provide the exciting possibility of both glottal excitation and, more generally, vocal tract measurements that are relatively immune to acoustic disturbances and can supplement the acoustic speech waveform. We are currently investigating methods of combining the output of these sensors for use in low-rate encoding according to their capability in representing specific speech characteristics in different frequency bands. Nonacoustic sensors have the ability to reveal certain speech attributes lost in the noisy acoustic signal; for example, low-energy consonant voice bars, nasality, and glottalized excitation. By fusing nonacoustic low-frequency and pitch content with acoustic-microphone content, we have achieved significant intelligibility performance gains using the DRT across a variety of environments over the government standard 2400-bps MELPe coder. By fusing quantized high-band 4-to-8-kHz speech, requiring only an additional 116 bps, we obtain further DRT performance gains by exploiting the ear's insensitivity to fine spectral detail in this frequency region.