“Put-that-there”: Voice and gesture at the graphics interface
SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality
Proceedings of the 5th international conference on Multimodal interfaces
CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 12 - Volume 12
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Learning Object Categories from Google"s Image Search
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Towards adaptive object recognition for situated human-computer interaction
Proceedings of the 2007 workshop on Multimodal interfaces in semantic interaction
Audio-visual robot command recognition: D-META'12 grand challenge
Proceedings of the 14th ACM international conference on Multimodal interaction
Hi-index | 0.00 |
Multimodal scene understanding is an integral part of humanrobot interaction (HRI) in situated environments. Especially useful is category-level recognition, where the the system can recognize classes of objects of scenes rather than specific instances (e.g., any chair vs. this particular chair.) Humans use multiple modalities to understand which object category is being referred to, simultaneously interpreting gesture, speech and visual appearance, and using one modality to disambiguate the information contained in the others. In this paper, we address the problem of fusing visual and acoustic information to predict object categories, when an image of the object and speech input from the user is available to the HRI system. Using probabilistic decision fusion, we show improved classification rates on a dataset containing a wide variety of object categories, compared to using either modality alone.