A technique for computer detection and correction of spelling errors
Communications of the ACM
Robust automatic speech recognition with missing and unreliable acoustic data
Speech Communication
Labeling images with a computer game
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Human computation
TurKit: human computation algorithms on mechanical turk
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology
Soylent: a word processor with a crowd inside
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology
VizWiz: nearly real-time answers to visual questions
UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology
Real-time crowd control of existing interfaces
Proceedings of the 24th annual ACM symposium on User interface software and technology
CrowdForge: crowdsourcing complex work
Proceedings of the 24th annual ACM symposium on User interface software and technology
The design of human-powered access technology
The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility
Scribe4Me: evaluating a mobile sound transcription tool for the deaf
UbiComp'06 Proceedings of the 8th international conference on Ubiquitous Computing
A readability evaluation of real-time crowd captions in the classroom
Proceedings of the 14th international ACM SIGACCESS conference on Computers and accessibility
Online quality control for real-time crowd captioning
Proceedings of the 14th international ACM SIGACCESS conference on Computers and accessibility
Real-time crowd labeling for deployable activity recognition
Proceedings of the 2013 conference on Computer supported cooperative work
An introduction to crowdsourcing for language and multimedia technology research
PROMISE'12 Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization
Legion scribe: real-time captioning by the non-experts
Proceedings of the 10th International Cross-Disciplinary Conference on Web Accessibility
Warping time for more effective real-time crowdsourcing
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Adaptive time windows for real-time crowd captioning
CHI '13 Extended Abstracts on Human Factors in Computing Systems
Chorus: a crowd-powered conversational assistant
Proceedings of the 26th annual ACM symposium on User interface software and technology
Real-time captioning by non-experts with legion scribe
Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility
Crowd caption correction (CCC)
Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility
Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility
Answering visual questions with conversational crowd assistants
Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility
Motivating contribution in a participatory sensing system via quid-pro-quo
Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing
Information extraction and manipulation threats in crowd-powered systems
Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing
Accessibility Evaluation of Classroom Captions
ACM Transactions on Accessible Computing (TACCESS)
Hi-index | 0.00 |
Real-time captioning provides deaf and hard of hearing people immediate access to spoken language and enables participation in dialogue with others. Low latency is critical because it allows speech to be paired with relevant visual cues. Currently, the only reliable source of real-time captions are expensive stenographers who must be recruited in advance and who are trained to use specialized keyboards. Automatic speech recognition (ASR) is less expensive and available on-demand, but its low accuracy, high noise sensitivity, and need for training beforehand render it unusable in real-world situations. In this paper, we introduce a new approach in which groups of non-expert captionists (people who can hear and type) collectively caption speech in real-time on-demand. We present Legion:Scribe, an end-to-end system that allows deaf people to request captions at any time. We introduce an algorithm for merging partial captions into a single output stream in real-time, and a captioning interface designed to encourage coverage of the entire audio stream. Evaluation with 20 local participants and 18 crowd workers shows that non-experts can provide an effective solution for captioning, accurately covering an average of 93.2% of an audio stream with only 10 workers and an average per-word latency of 2.9 seconds. More generally, our model in which multiple workers contribute partial inputs that are automatically merged in real-time may be extended to allow dynamic groups to surpass constituent individuals (even experts) on a variety of human performance tasks.