Modeling coarticulation in EMG-based continuous speech recognition

Authors:
Tanja Schultz;Michael Wand
Affiliations:
Karlsruhe Institute of Technology, Cognitive Systems Laboratory, Adenauerring 4, 76131 Karlsruhe, Germany;Karlsruhe Institute of Technology, Cognitive Systems Laboratory, Adenauerring 4, 76131 Karlsruhe, Germany
Venue:
Speech Communication
Year:
2010

Citing 3
Cited 6

Split and Merge EM Algorithm for Improving Gaussian Mixture Density Estimates

Journal of VLSI Signal Processing Systems
Decision trees for phonological rules in continuous speech

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Silent speech interfaces

Speech Communication

Silent speech interfaces

Speech Communication
ICCHP keynote: recognizing silent and weak speech based on electromyography

ICCHP'10 Proceedings of the 12th international conference on Computers helping people with special needs: Part I
Feature selection for vowel recognition based on surface electromyography derived with multichannel electrode grid

IScIDE'11 Proceedings of the Second Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering
Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing

Speech Communication
Automatic visual speech segmentation and recognition using directional motion history images and Zernike moments

The Visual Computer: International Journal of Computer Graphics
Towards excluding redundancy in electrode grid for automatic speech recognition based on surface EMG

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper discusses the use of surface electromyography for automatic speech recognition. Electromyographic signals captured at the facial muscles record the activity of the human articulatory apparatus and thus allow to trace back a speech signal even if it is spoken silently. Since speech is captured before it gets airborne, the resulting signal is not masked by ambient noise. The resulting Silent Speech Interface has the potential to overcome major limitations of conventional speech-driven interfaces: it is not prone to any environmental noise, allows to silently transmit confidential information, and does not disturb bystanders. We describe our new approach of phonetic feature bundling for modeling coarticulation in EMG-based speech recognition and report results on the EMG-PIT corpus, a multiple speaker large vocabulary database of silent and audible EMG speech recordings, which we recently collected. Our results on speaker-dependent and speaker-independent setups show that modeling the interdependence of phonetic features reduces the word error rate of the baseline system by over 33% relative. Our final system achieves 10% word error rate for the best-recognized speaker on a 101-word vocabulary task, bringing EMG-based speech recognition within a useful range for the application of Silent Speech Interfaces.