The effect of speech recognition accuracy rates on the usefulness and usability of webcast archives

Authors:
Cosmin Munteanu;Ronald Baecker;Gerald Penn;Elaine Toms;David James
Affiliations:
University of Toronto, Toronto, Canada;University of Toronto, Toronto, Canada;University of Toronto, Toronto, Canada;Dalhousie University, Halifax, Canada;University of Toronto, Toronto, Canada
Venue:
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Year:
2006

Citing 10
Cited 22

The just noticeable difference of speech recognition accuracy

CHI '95 Conference Companion on Human Factors in Computing Systems
SpeechSkimmer: a system for interactively skimming recorded speech

ACM Transactions on Computer-Human Interaction (TOCHI) - Special issue on speech as data
User acceptance of handwritten recognition accuracy

CHI '94 Conference Companion on Human Factors in Computing Systems
Nomadic radio: speech and audio interaction for contextual messaging in nomadic environments

ACM Transactions on Computer-Human Interaction (TOCHI) - Special issue on human-computer interaction with mobile systems
SCANMail: a voicemail interface that makes speech browsable, readable and searchable

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
The LIMSI Broadcast News transcription system

Speech Communication - Special issue on automatic transcription of broadcast news data
Speech recognition in university classrooms: liberated learning project

Proceedings of the fifth international ACM conference on Assistive technologies
A principled design for scalable internet visual communications with rich media, interactivity, and structured archives

CASCON '03 Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative research
User strategies for handling information tasks in webcasts

CHI '05 Extended Abstracts on Human Factors in Computing Systems
Assessing tools for use with webcasts

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries

Automatic speech recognition for webcasts: how good is good enough and what to do when it isn't

Proceedings of the 8th international conference on Multimodal interfaces
Collaborative editing for improved usefulness and usability of transcript-enhanced webcasts

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Achieving accessibility with self-interested designers: a strategic knowledge-acquisition approach

CHI '08 Extended Abstracts on Human Factors in Computing Systems
Conversation clusters: human-computer dialog for topic extraction

CHI '08 Extended Abstracts on Human Factors in Computing Systems
Designing a sketch recognition front-end: user perception of interface elements

SBIM '07 Proceedings of the 4th Eurographics workshop on Sketch-based interfaces and modeling
Audio Puzzler: piecing together time-stamped speech transcripts with a puzzle game

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Conversation clusters: grouping conversation topics through human-computer dialog

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Disclosing spoken culture: user interfaces for access to spoken word archives

BCS-HCI '08 Proceedings of the 22nd British HCI Group Annual Conference on People and Computers: Culture, Creativity, Interaction - Volume 1
Accessing speech documents on smartphones

Proceedings of the 5th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking, and Services
Summarizing multiple spoken documents: finding evidence from untranscribed audio

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Improving automatic speech recognition for lectures through transformation-based rules learned from minimal data

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Effects of automated transcription quality on non-native speakers' comprehension in real-time computer-mediated communication

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
User expectations from dictation on mobile devices

HCI'07 Proceedings of the 12th international conference on Human-computer interaction: interaction platforms and techniques
Automatically generated captions: will they help non-native speakers communicate in english?

Proceedings of the 3rd international conference on Intercultural collaboration
Evaluating models of latent document semantics in the presence of OCR errors

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Imposing hierarchical browsing structures onto spoken documents

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Understanding how webcasts are used as sources of information

Journal of the American Society for Information Science and Technology
A normalized-cut alignment model for mapping hierarchical semantic structures onto spoken documents

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Enhancing learning accessibility through fully automatic captioning

Proceedings of the International Cross-Disciplinary Conference on Web Accessibility
Spoken Content Retrieval: A Survey of Techniques and Technologies

Foundations and Trends in Information Retrieval
Ecological validity and the evaluation of speech summarization quality

Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization
Treemaps to visualise and navigate speech audio

Proceedings of the 25th Australian Computer-Human Interaction Conference: Augmentation, Application, Innovation, Collaboration

Quantified Score

Hi-index	0.01

Visualization

Abstract

The widespread availability of broadband connections has led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. In the absence of transcripts of what was said, users have difficulty searching and scanning for specific topics. This research investigates user needs for transcription accuracy in webcast archives, and measures how the quality of transcripts affects user performance in a question-answering task, and how quality affects overall user experience. We tested 48 subjects in a within-subjects design under 4 conditions: perfect transcripts, transcripts with 25% Word Error Rate (WER), transcripts with 45% WER, and no transcript. Our data reveals that speech recognition accuracy linearly influences both user performance and experience, shows that transcripts with 45% WER are unsatisfactory, and suggests that transcripts having a WER of 25% or less would be useful and usable in webcast archives.