An embedded audio-visual tracking and speech purification system on a dual-core processor platform

Authors:
Jwu-Sheng Hu;Ming-Tang Lee;Chia-Hsing Yang
Affiliations:
Department of Electrical and Control Engineering, National Chiao-Tung University, Hsinchu, Taiwan, ROC;Department of Electrical and Control Engineering, National Chiao-Tung University, Hsinchu, Taiwan, ROC;Department of Electrical and Control Engineering, National Chiao-Tung University, Hsinchu, Taiwan, ROC
Venue:
Microprocessors & Microsystems
Year:
2010

Citing 12
Cited 0

Active Audition for Humanoid

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Sound and Visual Tracking for Humanoid Robot

Applied Intelligence
Efficient Mean-Shift Tracking via a New Similarity Measure

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Spatiograms versus Histograms for Region-Based Tracking

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
A joint particle filter for audio-visual speaker tracking

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
A real-time prototype for small-vocabulary audio-visual ASR

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Frequency Domain Microphone Array Calibration and Beamforming for Automatic Speech Recognition

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Probabilistic integration of sparse audio-visual cues for identity tracking

MM '08 Proceedings of the 16th ACM international conference on Multimedia
A generative approach to audio-visual person tracking

CLEAR'06 Proceedings of the 1st international evaluation conference on Classification of events, activities and relationships
Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings

IEEE Transactions on Audio, Speech, and Language Processing
Robust speaker's location detection in a vehicle environment using GMM models

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Enabling effective human-robot interaction using perspective-taking in robots

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Quantified Score

Hi-index	0.00

Visualization

Abstract

Design of an embedded audio-visual tracking and speech purification system is described in this paper. The system is able to perform human face tracking, voice activity detection, sound source direction estimation, and speech enhancement in real-time. Estimating the sound source directions helps to initialize the human face tracking module when the target changes the direction. The implementation architecture is based on an embedded dual-core processor, Texas Instruments DM6446 platform (Davinci), which contains an ARM core and a DSP core. For speech signal processing, an eight-channel digital microphone array is developed and the associated pre-processing and interfacing features are designed using the Altera Cyclone II FPGA. All the experiments are conducted in a real environment and the experimental results show that this system can execute all the audition and vision functions in real-time.