Towards a New Image-Based Spectrogram Segmentation Speech Coder Optimised for Intelligibility
MMM '09 Proceedings of the 15th International Multimedia Modeling Conference on Advances in Multimedia Modeling
IEEE Transactions on Audio, Speech, and Language Processing
Speech Communication
ICCHP keynote: recognizing silent and weak speech based on electromyography
ICCHP'10 Proceedings of the 12th international conference on Computers helping people with special needs: Part I
Hi-index | 0.00 |
The intelligibility of speech transmitted through low-rate coders is severely degraded when high levels of acoustic noise are present in the acoustic environment. Recent advances in nonacoustic sensors, including microwave radar, skin vibration, and bone conduction sensors, provide the exciting possibility of both glottal excitation and, more generally, vocal tract measurements that are relatively immune to acoustic disturbances and can supplement the acoustic speech waveform. We are currently investigating methods of combining the output of these sensors for use in low-rate encoding according to their capability in representing specific speech characteristics in different frequency bands. Nonacoustic sensors have the ability to reveal certain speech attributes lost in the noisy acoustic signal; for example, low-energy consonant voice bars, nasality, and glottalized excitation. By fusing nonacoustic low-frequency and pitch content with acoustic-microphone content, we have achieved significant intelligibility performance gains using the DRT across a variety of environments over the government standard 2400-bps MELPe coder. By fusing quantized high-band 4-to-8-kHz speech, requiring only an additional 116 bps, we obtain further DRT performance gains by exploiting the ear's insensitivity to fine spectral detail in this frequency region.