Real-time auditory and visual multiple-object tracking for humanoids

Authors:
Kazuhiro Nakadai;Ken-ichi Hidai;Hiroshi Mizoguchi;Hiroshi G. Okuno;Hiroaki Kitano
Affiliations:
Kitano Symbiotic Systems Project, ERATO, Japan Science and Technology Corp., Shibuya-ku, Tokyo, Japan;Kitano Symbiotic Systems Project, ERATO, Japan Science and Technology Corp., Shibuya-ku, Tokyo, Japan;Department of Information and Computer Science, Saitama University, Saitama, Japan;Kitano Symbiotic Systems Project, ERATO, Japan Science and Technology Corp., Shibuya-ku, Tokyo, Japan and Department of Intelligence Science and Technology, Kyoto University, Kyoto, Japan;Kitano Symbiotic Systems Project, ERATO, Japan Science and Technology Corp., Shibuya-ku, Tokyo, Japan and Sony Computer Science Laboratories, Inc., Tokyo, Japan
Venue:
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Year:
2001

Citing 9
Cited 23

Computational auditory scene analysis

Computational auditory scene analysis
Alternative essences of intelligence

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Template-based recognition of pose and motion gestures on a mobile robot

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Using vision to improve sound source separation

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
A Context-Dependent Attention System for a Social Robot

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Active Audition for Humanoid

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Convergence Analysis of Online Linear Discriminant Analysis

IJCNN '00 Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00)-Volume 3 - Volume 3
Eigenfaces for recognition

Journal of Cognitive Neuroscience
Socially embedded learning of the office-conversant mobile robot Jijo-2

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2

Social Interaction of Humanoid RobotBased on Audio-Visual Tracking

IEA/AIE '02 Proceedings of the 15th international conference on Industrial and engineering applications of artificial intelligence and expert systems: developments in applied artificial intelligence
Realizing Audio-Visually Triggered ELIZA-Like Non-verbal Behaviors

PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Exploiting auditory fovea in humanoid-human interaction

Eighteenth national conference on Artificial intelligence
A Graphical Model for Audiovisual Object Tracking

IEEE Transactions on Pattern Analysis and Machine Intelligence
Cooperative embodied communication emerged by interactive humanoid robots

International Journal of Human-Computer Studies - Special issue: Subtle expressivity for characters and robots
Design and implementation of personality of humanoids in human humanoid non-verbal interaction

IEA/AIE'2003 Proceedings of the 16th international conference on Developments in applied artificial intelligence
Analysis of human behavior to a communication robot in an open field

Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction
How contingent should a communication robot be?

Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction
Speed adaptation for a robot walking with a human

Proceedings of the ACM/IEEE international conference on Human-robot interaction
Scientific Issues Concerning Androids

International Journal of Robotics Research
A humanoid robot that pretends to listen to route guidance from a human

Autonomous Robots
Active audition using the parameter-less self-organising map

Autonomous Robots
A semi-autonomous communication robot: a field trial at a train station

Proceedings of the 3rd ACM/IEEE international conference on Human robot interaction
How contingent should a lifelike robot be? The relationship between contingency and complexity

Connection Science
Body movement analysis of human-robot interaction

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Lexical entrainment in human-robot interaction: can robots entrain human vocabulary?

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Recognition of simultaneous speech by estimating reliability of separated signals for robot audition

PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Hierarchical neuro-fuzzy systems

IWANN'03 Proceedings of the Artificial and natural neural networks 7th international conference on Computational methods in neural modeling - Volume 1
Real-time auditory and visual talker tracking through integrating EM algorithm and particle filter

IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
The design of phoneme grouping for coarse phoneme recognition

IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
Entrainment of pointing gestures by robot motion

ICSR'10 Proceedings of the Second international conference on Social robotics
Analyzing the human-robot interaction abilities of a general-purpose social robot in different naturalistic environments

RoboCup 2009
Cooperative passers-by tracking with a mobile robot and external cameras

Computer Vision and Image Understanding

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a real-time auditory and visual tracking of multiple objects for humanoid under real-world environments. Real-time processing is crucial for sensorimotor tasks in tracking, and multiple-object tracking is crucial for real-world applications. Multiple sound source tracking needs perception of a mixture of sounds and cancellation of motor noises caused by body movements. However its real-time processing has not been reported yet. Real-time tracking is attained by fusing information obtained by sound source localization, multiple face recognition, speaker tracking, focus of attention control, and motor control. Auditory streams with sound source direction are extracted by active audition system with motor noise cancellation capability from 48KHz sampling sounds. Visual streams with face ID and 3D-position are extracted by combining skincolor extraction, correlation-based matching, and multiple-scale image generation from a single camera. These auditory and visual streams are associated by comparing the spatial location, and associated streams are used to control focus of attention. Auditory, visual, and association processing are performed asynchronously on different PC's connected by TCP/IP network. The resulting system implemented on an upper-torso humanoid can track multiple objects with the delay of 200 msec, which is forced by visual tracking and network latency.