Multimodal error correction for speech user interfaces

Authors:
Bernhard Suhm;Brad Myers;Alex Waibel
Affiliations:
BBN Technologies, Cambridge, MA;Carnegie Mellon Univ., Pittsburgh, PA;Carnegie Mellon Univ. and Karlsruhe Univ. (Germany), Pittsburgh, PA
Venue:
ACM Transactions on Computer-Human Interaction (TOCHI)
Year:
2001

Citing 20
Cited 52

The use of hand-drawn gestures for text editing

International Journal of Man-Machine Studies
Self-organized language modeling for speech recognition

Readings in speech recognition
Specifying gestures by example

Proceedings of the 18th annual conference on Computer graphics and interactive techniques
Designing the user interface (2nd ed.): strategies for effective human-computer interaction

Designing the user interface (2nd ed.): strategies for effective human-computer interaction
Feedback strategies for error correction in speech recognition systems

International Journal of Man-Machine Studies
Modeling error recovery and repair in automatic speech recognition

International Journal of Man-Machine Studies
Data-entry by voice: facilitating correction of misrecognitions

Interactive speech technology
Integration and synchronization of input modes during multimodal human-computer interaction

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
MedSpeak: report creation with continuous speech recognition

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Predictive engineering models based on the EPIC architecture for a multimodal high-performance human-computer interaction task

ACM Transactions on Computer-Human Interaction (TOCHI)
Patterns of entry and correction in large vocabulary continuous speech recognition systems

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Mutual disambiguation of recognition errors in a multimodel architecture

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Model-based and empirical evaluation of multimodal interactive error correction

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Providing integrated toolkit-level support for ambiguity in recognition-based interfaces

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Composing letters with a simulated listening typewriter

Communications of the ACM
Overcoming unusability: developing efficient strategies in speech recognition systems

CHI '00 Extended Abstracts on Human Factors in Computing Systems
NPen/sup ++/: a writer independent, large vocabulary on-line cursive handwriting recognition system

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Error-responsive feedback mechanisms for speech recognizers

Error-responsive feedback mechanisms for speech recognizers
A framework and toolkit for the construction of multimodal learning interfaces

A framework and toolkit for the construction of multimodal learning interfaces
Characterizing and recognizing spoken corrections in human-computer dialogue

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1

Speech-based cursor control

Proceedings of the fifth international ACM conference on Assistive technologies
Error recovery in a blended style eye gaze and speech interface

Proceedings of the 5th international conference on Multimodal interfaces
The role of spoken feedback in experiencing multimodal interfaces as human-like

Proceedings of the 5th international conference on Multimodal interfaces
Experimental evaluation of vision and speech based multimodal interfaces

Proceedings of the 2001 workshop on Perceptive user interfaces
Presiding over accidents: system direction of human action

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Speech-based cursor control: a study of grid-based solutions

Assets '04 Proceedings of the 6th international ACM SIGACCESS conference on Computers and accessibility
Using confidence scores to improve hands-free speech based navigation in continuous dictation systems

ACM Transactions on Computer-Human Interaction (TOCHI)
Multimodal user input patterns in a non-visual context

Proceedings of the 7th international ACM SIGACCESS conference on Computers and accessibility
Multimodal error correction for continuous handwriting recognition in pen-based user interfaces

Proceedings of the 11th international conference on Intelligent user interfaces
Error correction of voicemail transcripts in SCANMail

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A longitudinal evaluation of hands-free speech-based navigation during dictation

International Journal of Human-Computer Studies
Word graph based speech rcognition error correction by handwriting input

Proceedings of the 8th international conference on Multimodal interfaces
Making the mainstream accessible: redefining the game

Proceedings of the 2006 ACM SIGGRAPH symposium on Videogames
Crossmodal error correction of continuous handwriting recognition by speech

Proceedings of the 12th international conference on Intelligent user interfaces
Towards a taxonomy of error-handling strategies in recognition-based multi-modal human-computer interfaces

Signal Processing - Special section: Multimodal human-computer interfaces
Discovering Cues to Error Detection in Speech Recognition Output: A User-Centered Approach

Journal of Management Information Systems
Text Entry Systems: Mobility, Accessibility, Universality

Text Entry Systems: Mobility, Accessibility, Universality
On the benefits of confidence visualization in speech recognition

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A system for dynamic 3D visualisation of speech recognition paths

AVI '08 Proceedings of the working conference on Advanced visual interfaces
ASR post-correction for spoken dialogue systems based on semantic, syntactic, lexical and contextual information

Speech Communication
Tlk or txt? Using voice input for SMS composition

Personal and Ubiquitous Computing
Search Vox: leveraging multimodal refinement and partial knowledge for mobile voice search

Proceedings of the 21st annual ACM symposium on User interface software and technology
Acceptance of speech recognition by physicians: A survey of expectations, experiences, and social influence

International Journal of Human-Computer Studies
Handling uncertainty in multimodal pervasive computing applications

Computer Communications
Hands-free, speech-based navigation during dictation: difficulties, consequences, and solutions

Human-Computer Interaction
Interactive visualisation techniques for dynamic speech transcription, correction and training

Proceedings of the 9th ACM SIGCHI New Zealand Chapter's International Conference on Human-Computer Interaction: Design Centered HCI
Parakeet: a continuous speech recognition system for mobile touch-screen devices

Proceedings of the 14th international conference on Intelligent user interfaces
Interface design strategies for computer-assisted speech transcription

Proceedings of the 20th Australasian Conference on Computer-Human Interaction: Designing for Habitus and Habitat
Understanding users' perception of speech recognition errors in mobile communication

International Journal of Mobile Learning and Organisation
Graph-based partial hypothesis fusion for pen-aided speech input

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Evaluating the benefits of multimodal interface design for CoMPASS--a mobile GIS

Geoinformatica
Multimodal interactive transcription of text images

Pattern Recognition
Speech dasher: fast writing using speech and gaze

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
User expectations from dictation on mobile devices

HCI'07 Proceedings of the 12th international conference on Human-computer interaction: interaction platforms and techniques
An empirical study on users' acceptance of speech recognition errors in text-messaging

HCI'07 Proceedings of the 12th international conference on Human-computer interaction: intelligent multimodal interaction environments
Interactive pattern recognition

MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
Third-party error detection support mechanisms for dictation speech recognition

Interacting with Computers
Supporting collaborative transcription of recorded speech with a 3D game interface

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part IV
In-car dictation and driver's distraction: a case study

HCII'11 Proceedings of the 14th international conference on Human-computer interaction: towards mobile and intelligent interaction environments - Volume Part III
Captioning for deaf and hard of hearing people by editing automatic speech recognition in real time

ICCHP'06 Proceedings of the 10th international conference on Computers Helping People with Special Needs
Combining modality theory and context models

PIT'06 Proceedings of the 2006 international tutorial and research conference on Perception and Interactive Technologies
Speech recognition on mobile devices

Mobile Multimedia Processing
Voice typing: a new speech interaction model for dictation on touchscreen devices

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
SpeeG: a multimodal speech- and gesture-based text input solution

Proceedings of the International Working Conference on Advanced Visual Interfaces
Dictating and editing short texts while driving: distraction and task completion

Proceedings of the 3rd International Conference on Automotive User Interfaces and Interactive Vehicular Applications
Visualization of uncertainty in lattices to support decision-making

EUROVIS'07 Proceedings of the 9th Joint Eurographics / IEEE VGTC conference on Visualization
Fishing or a Z?: investigating the effects of error on mimetic and alphabet device-based gesture interaction

Proceedings of the 14th ACM international conference on Multimodal interaction
Impact of word error rate on driving performance while dictating short texts

Proceedings of the 4th International Conference on Automotive User Interfaces and Interactive Vehicular Applications
Speech augmented multitouch interaction patterns

Proceedings of the 16th European Conference on Pattern Languages of Programs
Exploring the use of speech input by blind people on mobile devices

Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility
Aiding human discovery of handwriting recognition errors

Proceedings of the 15th ACM on International conference on multimodal interaction
An iterative multimodal framework for the transcription of handwritten historical documents

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although commercial dictation systems and speech-enabled telephone voice user interfaces have become readily available, speech recognition errors remain a serious problem in the design and implementation of speech user interfaces. Previous work hypothesized that switching modality could speed up interactive correction of recognition errors. This article presents multimodal error correction methods that allow the user to correct recognition errors efficiently without keyboard input. Correction accuracy is maximized by novel recognition algorithms that use context information for recognizing correction input. Multimodal error correction is evaluated in the context of a prototype multimodal dictation system. The study shows that unimodal repair is less accurate than multimodal error correction. On a dictation task, multimodal correction is faster than unimodal correction by respeaking. The study also provides empirical evidence that system-initiated error correction (based on confidence measures) may not expedite error correction. Furthermore, the study suggests that recognition accuracy determines user choice between modalities: while users initially prefer speech, they learn to avoid ineffective correction modalities with experience. To extrapolate results from this user study, the article introduces a performance model of (recognition-based) multimodal interaction that predicts input speed including time needed for error correction. Applied to interactive error correction, the model predicts the impact of improvements in recognition technology on correction speeds, and the influence of recognition accuracy and correction method on the productivity of dictation systems. This model is a first step toward formalizing multimodal interaction.