Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Bayesian Networks and Decision Graphs
Bayesian Networks and Decision Graphs
Array Signal Processing: Concepts and Techniques
Array Signal Processing: Concepts and Techniques
Background Modeling for Segmentation of Video-Rate Stereo Sequences
CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Speaker Adaptation in the Philips System for Large Vocabulary Continuous Speech Recognition
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Speaker localization for microphone array-based ASR: the effects of accuracy on overlapping speech
Proceedings of the 8th international conference on Multimodal interfaces
EURASIP Journal on Applied Signal Processing
Detection and separation of speech events in meeting recordings using a microphone array
EURASIP Journal on Audio, Speech, and Music Processing
Automatic voice activity detection in different speech applications
Proceedings of the 1st international conference on Forensic applications and techniques in telecommunications, information, and multimedia and workshop
Signal Processing Techniques for Robust Speech Recognition
IEICE - Transactions on Information and Systems
A speaker diarization method based on the probabilistic fusion of audio-visual location information
Proceedings of the 2009 international conference on Multimodal interfaces
Hi-index | 0.00 |
A method of detecting speech events in a multiple-sound-source condition using audio and video information is proposed. For detecting speech events, sound localization using a microphone array and human tracking by stereo vision is combined by a Bayesian network. From the inference results of the Bayesian network, information on the time and location of speech events can be known. The information on the detected speech events is then utilized in the robust speech interface. A maximum likelihood adaptive beamformer is employed as a preprocessor of the speech recognizer to separate the speech signal from environmental noise. The coefficients of the beamformer are kept updated based on the information of the speech events. The information on the speech events is also used by the speech recognizer for extracting the speech segment.