A two-stage multimodal speaker location-aware approach in pervasive computing

Authors:
Ruo-gui Xiao;Tong-qiang Guo
Affiliations:
Institute of Artificial Intelligence, School of Computer Science, Zhejiang University, Zheda Road 38, Yuquan Campus, Hangzhou, Zhejiang Province 310027, China.;Institute of Artificial Intelligence, School of Computer Science, Zhejiang University, Zheda Road 38, Yuquan Campus, Hangzhou, Zhejiang Province 310027, China
Venue:
International Journal of Computer Applications in Technology
Year:
2010

Citing 8
Cited 0

Active Appearance Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Distributed meetings: a meeting capture and broadcasting system

Proceedings of the tenth ACM international conference on Multimedia
Pervasive Computing: A Paradigm for the 21st Century

Computer
Enabling Location-Aware Pervasive Computing Applications for the Edlerly

PERCOM '03 Proceedings of the First IEEE International Conference on Pervasive Computing and Communications
A multi-modal approach for determining speaker location and focus

Proceedings of the 5th international conference on Multimodal interfaces
Universal Interactions with Smart Spaces

IEEE Pervasive Computing
Tracking of Multiple Humans in Meetings

CVPRW '06 Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop
Video-based face recognition using adaptive hidden markov models

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Location-aware computing is important in pervasive computing and intelligent video surveillance. We propose a two-stage multimodal approach to locate the active speaker in intelligent environments. Firstly, human voice is captured as audio cue to find the approximate orientation of current speaker. Secondly, the colour feature of mouth region is extracted as visual cue to detect continuous mouth motion that identifies the active speaker. The speaking recognition is conducted by a well-trained Hidden Markov Model based on colour feature of mouth region during continuous motion. Experiments show that the proposed multimodal approach is effective for speaker localisation in intelligent indoor environments.