SpeeG: a multimodal speech- and gesture-based text input solution

Authors:
Lode Hoste;Bruno Dumas;Beat Signer
Affiliations:
Vrije Universiteit Brussel, Pleinlaan, Brussels, Belgium;Vrije Universiteit Brussel, Pleinlaan, Brussels, Belgium;Vrije Universiteit Brussel, Pleinlaan, Brussels, Belgium
Venue:
Proceedings of the International Working Conference on Advanced Visual Interfaces
Year:
2012

Citing 10
Cited 4

Taming recognition errors with a multimodal interface

Communications of the ACM
Dasher—a data entry interface using continuous gestures and language models

UIST '00 Proceedings of the 13th annual ACM symposium on User interface software and technology
Multimodal error correction for speech user interfaces

ACM Transactions on Computer-Human Interaction (TOCHI)
Phrase sets for evaluating text entry techniques

CHI '03 Extended Abstracts on Human Factors in Computing Systems
Text entry using a dual joystick game controller

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Now Dasher! Dash away!: longitudinal study of fast text entry by Eye Gaze

Proceedings of the 2008 symposium on Eye tracking research & applications
Parakeet: a continuous speech recognition system for mobile touch-screen devices

Proceedings of the 14th international conference on Intelligent user interfaces
Interactive ASR error correction for touchscreen devices

HLT-Demonstrations '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Demo Session
Sphinx-4: a flexible open source framework for speech recognition

Sphinx-4: a flexible open source framework for speech recognition
Speech dasher: fast writing using speech and gaze

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

MoFIS: a mobile user interface for semi-automatic extraction of food product ingredient lists

Proceedings of the companion publication of the 2013 international conference on Intelligent user interfaces companion
Freehand gestural text entry for interactive TV

Proceedings of the 11th european conference on Interactive TV and video
SpeeG2: a speech- and gesture-based interface for efficient controller-free text input

Proceedings of the 15th ACM on International conference on multimodal interaction
MoveRC: attention-aware remote control

Proceedings of the 19th Brazilian symposium on Multimedia and the web

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present SpeeG, a multimodal speech- and body gesture-based text input system targeting media centres, set-top boxes and game consoles. Our controller-free zoomable user interface combines speech input with a gesture-based real-time correction of the recognised voice input. While the open source CMU Sphinx voice recogniser transforms speech input into written text, Microsoft's Kinect sensor is used for the hand gesture tracking. A modified version of the zoomable Dasher interface combines the input from Sphinx and the Kinect sensor. In contrast to existing speech error correction solutions with a clear distinction between a detection and correction phase, our innovative SpeeG text input system enables continuous real-time error correction. An evaluation of the SpeeG prototype has revealed that low error rates for a text input speed of about six words per minute can be achieved after a minimal learning phase. Moreover, in a user study SpeeG has been perceived as the fastest of all evaluated user interfaces and therefore represents a promising candidate for future controller-free text input.