Voice typing: a new speech interaction model for dictation on touchscreen devices

Authors:
Anuj Kumar;Tim Paek;Bongshin Lee
Affiliations:
Microsoft Research, Redmond, Washington & Carnegie Mellon University, Pittsburgh, Pennsylvania, United States;Microsoft Research, Redmond, Washington, United States;Microsoft Research, Redmond, Washington, United States
Venue:
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Year:
2012

Citing 14
Cited 3

Fundamentals of speech recognition

Fundamentals of speech recognition
User learning and performance with marking menus

CHI '94 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Patterns of entry and correction in large vocabulary continuous speech recognition systems

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Multimodal error correction for speech user interfaces

ACM Transactions on Computer-Human Interaction (TOCHI)
Trends in Speech Recognition

Trends in Speech Recognition
Shorthand writing on stylus keyboard

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
MAP estimation of continuous density HMM: theory and applications

HLT '91 Proceedings of the workshop on Speech and Natural Language
Investigating the effectiveness of tactile feedback for mobile touchscreens

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Incremental dialogue processing in a micro-domain

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Assessing and improving the performance of speech recognition for incremental systems

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Usability guided key-target resizing for soft keyboards

Proceedings of the 15th international conference on Intelligent user interfaces
User expectations from dictation on mobile devices

HCI'07 Proceedings of the 12th international conference on Human-computer interaction: interaction platforms and techniques
The Android Developer's Cookbook: Building Applications with the Android SDK

The Android Developer's Cookbook: Building Applications with the Android SDK
Stability and accuracy in incremental speech recognition

SIGDIAL '11 Proceedings of the SIGDIAL 2011 Conference

SeeSay and HearSay CAPTCHA for mobile interaction

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Giving voice to enterprise mobile applications

Proceedings of the 15th international conference on Human-computer interaction with mobile devices and services
Exploring the use of speech input by blind people on mobile devices

Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility

Quantified Score

Hi-index	0.01

Visualization

Abstract

Dictation using speech recognition could potentially serve as an efficient input method for touchscreen devices. However, dictation systems today follow a mentally disruptive speech interaction model: users must first formulate utterances and then produce them, as they would with a voice recorder. Because utterances do not get transcribed until users have finished speaking, the entire output appears and users must break their train of thought to verify and correct it. In this paper, we introduce Voice Typing, a new speech interaction model where users' utterances are transcribed as they produce them to enable real-time error identification. For fast correction, users leverage a marking menu using touch gestures. Voice Typing aspires to create an experience akin to having a secretary type for you, while you monitor and correct the text. In a user study where participants composed emails using both Voice Typing and traditional dictation, they not only reported lower cognitive demand for Voice Typing but also exhibited 29% relative reduction of user corrections. Overall, they also preferred Voice Typing.