Speech dasher: fast writing using speech and gaze

Authors:
Keith Vertanen;David J.C. MacKay
Affiliations:
University of Cambridge, Cambridge, United Kingdom;University of Cambridge, Cambridge, United Kingdom
Venue:
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Year:
2010

Citing 4
Cited 6

Patterns of entry and correction in large vocabulary continuous speech recognition systems

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Taming recognition errors with a multimodal interface

Communications of the ACM
Dasher—a data entry interface using continuous gestures and language models

UIST '00 Proceedings of the 13th annual ACM symposium on User interface software and technology
Multimodal error correction for speech user interfaces

ACM Transactions on Computer-Human Interaction (TOCHI)

Humsher: a predictive keyboard operated by humming

The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility
Measuring the performance of gaze and speech for text input

Proceedings of the Symposium on Eye Tracking Research and Applications
Speak up your mind: using speech to capture innovative ideas on interactive surfaces

Proceedings of the 10th Brazilian Symposium on on Human Factors in Computing Systems and the 5th Latin American Conference on Human-Computer Interaction
SpeeG: a multimodal speech- and gesture-based text input solution

Proceedings of the International Working Conference on Advanced Visual Interfaces
Easier mobile phone input using the jusfone keyboard

ICCHP'12 Proceedings of the 13th international conference on Computers Helping People with Special Needs - Volume Part II
SpeeG2: a speech- and gesture-based interface for efficient controller-free text input

Proceedings of the 15th ACM on International conference on multimodal interaction

Quantified Score

Hi-index	0.01

Visualization

Abstract

Speech Dasher allows writing using a combination of speech and a zooming interface. Users first speak what they want to write and then they navigate through the space of recognition hypotheses to correct any errors. Speech Dasher's model combines information from a speech recognizer, from the user, and from a letter-based language model. This allows fast writing of anything predicted by the recognizer while also providing seamless fallback to letter-by-letter spelling for words not in the recognizer's predictions. In a formative user study, expert users wrote at 40 (corrected) words per minute. They did this despite a recognition word error rate of 22%. Furthermore, they did this using only speech and the direction of their gaze (obtained via an eye tracker).