Improving throat microphone speech recognition by joint analysis of throat and acoustic microphone recordings

Authors:
Engin Erzin
Affiliations:
College of Engineering, Koç University, Istanbul, Turkey
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2009

Citing 2
Cited 2

Analysis of Head Gesture and Prosody Patterns for Prosody-Driven Head-Gesture Animation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Exploiting nonacoustic sensors for speech encoding

IEEE Transactions on Audio, Speech, and Language Processing

Adding voice to whisper using a simple heuristic algorithm inferred from empirical observation

ICCHP'10 Proceedings of the 12th international conference on Computers helping people with special needs: Part I
Artificial bandwidth extension of spectral envelope along a Viterbi path

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new framework for joint analysis of throat and acoustic microphone (TAM) recordings to improve throat microphone only speech recognition. The proposed analysis framework aims to learn joint sub-phone patterns of throat and acoustic microphone recordings through a parallel branch HMM structure. The joint sub-phone patterns define temporally correlated neighborhoods, in which a linear prediction filter estimates a spectrally rich acoustic feature vector from throat feature vectors. Multimodal speech recognition with throat and throat-driven acoustic features significantly improves throat-only speech recognition performance. Experimental evaluations on a parallel TAM database yield benchmark phoneme recognition rates for throat-only and multimodal TAM speech recognition systems as 46.81% and 60.69%, respectively. The proposed throat-driven multimodal speech recognition system improves phoneme recognition rate to 52.58%, a significant relative improvement with respect to the throat-only speech recognition benchmark system.