Real-time captioning by groups of non-experts

Authors:
Walter Lasecki;Christopher Miller;Adam Sadilek;Andrew Abumoussa;Donato Borrello;Raja Kushalnagar;Jeffrey Bigham
Affiliations:
University of Rochester, Rochester, New York, USA;University of Rochester, Rochester, New York, USA;University of Rochester, Rochester, New York, USA;University of Rochester, Rochester, New York, USA;University of Rochester, Rochester, New York, USA;Rochester Institute of Technology, Rochester, New York, USA;University of Rochester, Rochester, New York, USA
Venue:
Proceedings of the 25th annual ACM symposium on User interface software and technology
Year:
2012

Citing 11
Cited 15

A technique for computer detection and correction of spelling errors

Communications of the ACM
Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
Labeling images with a computer game

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Human computation

Human computation
TurKit: human computation algorithms on mechanical turk

UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology
Soylent: a word processor with a crowd inside

UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology
VizWiz: nearly real-time answers to visual questions

UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology
Real-time crowd control of existing interfaces

Proceedings of the 24th annual ACM symposium on User interface software and technology
CrowdForge: crowdsourcing complex work

Proceedings of the 24th annual ACM symposium on User interface software and technology
The design of human-powered access technology

The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility
Scribe4Me: evaluating a mobile sound transcription tool for the deaf

UbiComp'06 Proceedings of the 8th international conference on Ubiquitous Computing

A readability evaluation of real-time crowd captions in the classroom

Proceedings of the 14th international ACM SIGACCESS conference on Computers and accessibility
Online quality control for real-time crowd captioning

Proceedings of the 14th international ACM SIGACCESS conference on Computers and accessibility
Real-time crowd labeling for deployable activity recognition

Proceedings of the 2013 conference on Computer supported cooperative work
An introduction to crowdsourcing for language and multimedia technology research

PROMISE'12 Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization
Legion scribe: real-time captioning by the non-experts

Proceedings of the 10th International Cross-Disciplinary Conference on Web Accessibility
Warping time for more effective real-time crowdsourcing

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Adaptive time windows for real-time crowd captioning

CHI '13 Extended Abstracts on Human Factors in Computing Systems
Chorus: a crowd-powered conversational assistant

Proceedings of the 26th annual ACM symposium on User interface software and technology
Real-time captioning by non-experts with legion scribe

Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility
Crowd caption correction (CCC)

Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility
Improving public transit accessibility for blind riders by crowdsourcing bus stop landmark locations with Google street view

Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility
Answering visual questions with conversational crowd assistants

Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility
Motivating contribution in a participatory sensing system via quid-pro-quo

Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing
Information extraction and manipulation threats in crowd-powered systems

Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing
Accessibility Evaluation of Classroom Captions

ACM Transactions on Accessible Computing (TACCESS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Real-time captioning provides deaf and hard of hearing people immediate access to spoken language and enables participation in dialogue with others. Low latency is critical because it allows speech to be paired with relevant visual cues. Currently, the only reliable source of real-time captions are expensive stenographers who must be recruited in advance and who are trained to use specialized keyboards. Automatic speech recognition (ASR) is less expensive and available on-demand, but its low accuracy, high noise sensitivity, and need for training beforehand render it unusable in real-world situations. In this paper, we introduce a new approach in which groups of non-expert captionists (people who can hear and type) collectively caption speech in real-time on-demand. We present Legion:Scribe, an end-to-end system that allows deaf people to request captions at any time. We introduce an algorithm for merging partial captions into a single output stream in real-time, and a captioning interface designed to encourage coverage of the entire audio stream. Evaluation with 20 local participants and 18 crowd workers shows that non-experts can provide an effective solution for captioning, accurately covering an average of 93.2% of an audio stream with only 10 workers and an average per-word latency of 2.9 seconds. More generally, our model in which multiple workers contribute partial inputs that are automatically merged in real-time may be extended to allow dynamic groups to surpass constituent individuals (even experts) on a variety of human performance tasks.