A multimodal learning interface for sketch, speak and point creation of a schedule chart

Authors:
Ed Kaiser;David Demirdjian;Alexander Gruenstein;Xiaoguang Li;John Niekrasz;Matt Wesson;Sanjeev Kumar
Affiliations:
Oregon Health and Science University, Beaverton, OR;MIT, Cambridge, MA;Stanford University, Stanford, CA;Oregon Health and Science University, Beaverton, OR;Stanford University, Stanford, CA;Oregon Health and Science University, Beaverton, OR;Oregon Health and Science University, Beaverton, OR
Venue:
Proceedings of the 6th international conference on Multimodal interfaces
Year:
2004

Citing 4
Cited 14

QuickSet: multimodal interaction for distributed applications

MULTIMEDIA '97 Proceedings of the fifth ACM international conference on Multimedia
Constraining Human Body Tracking

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality

Proceedings of the 5th international conference on Multimodal interfaces
Gemini: a natural language system for spoken-language understanding

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics

Multimodal new vocabulary recognition through speech and handwriting in a whiteboard scheduling application

Proceedings of the 10th international conference on Intelligent user interfaces
Distributed pointing for multimodal collaboration over sketched diagrams

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Human-centered collaborative interaction

Proceedings of the 1st ACM international workshop on Human-centered multimedia
Using redundant speech and handwriting for learning new vocabulary and understanding abbreviations

Proceedings of the 8th international conference on Multimodal interfaces
A novel method for multi-sensory data fusion in multimodal human computer interaction

OZCHI '06 Proceedings of the 18th Australia conference on Computer-Human Interaction: Design: Activities, Artefacts and Environments
Multimodal redundancy across handwriting and speech during computer mediated human-human interactions

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Magic Paper: Sketch-Understanding Research

Computer
An efficient unification-based multimodal language processor in multimodal input fusion

OZCHI '07 Proceedings of the 19th Australasian conference on Computer-Human Interaction: Entertaining User Interfaces
Multimodal support to group dynamics

Personal and Ubiquitous Computing - Special Issue: User-centred design and evaluation of ubiquitous groupware
An agent-based framework for sketched symbol interpretation

Journal of Visual Languages and Computing
HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces

HCI Beyond the GUI: Design for Haptic, Speech, Olfactory, and Other Nontraditional Interfaces
Skipping spare information in multimodal inputs during multimodal input fusion

Proceedings of the 14th international conference on Intelligent user interfaces
Agent-customized training for human learning performance enhancement

Computers & Education
An input-parsing algorithm supporting integration of deictic gesture in natural language interface

HCI'07 Proceedings of the 12th international conference on Human-computer interaction: intelligent multimodal interaction environments

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a video demonstration of an agent-based test bed application for ongoing research into multi-user, multimodal, computer-assisted meetings. The system tracks a two person scheduling meeting: one person standing at a touch sensitive whiteboard creating a Gantt chart, while another person looks on in view of a calibrated stereo camera. The stereo camera performs real-time, untethered, vision-based tracking of the onlooker's head, torso and limb movements, which in turn are routed to a 3D-gesture recognition agent. Using speech, 3D deictic gesture and 2D object de-referencing the system is able to track the onlooker's suggestion to move a specific milestone. The system also has a speech recognition agent capable of recognizing out-of-vocabulary (OOV) words as phonetic sequences. Thus when a user at the whiteboard speaks an OOV label name for a chart constituent while also writing it, the OOV speech is combined with letter sequences hypothesized by the handwriting recognizer to yield an orthography, pronunciation and semantics for the new label. These are then learned dynamically by the system and become immediately available for future recognition.