Interactive visualisation techniques for dynamic speech transcription, correction and training

Authors:
Saturnino Luz;Masood Masoodian;Bill Rogers
Affiliations:
Trinity College Dublin, Ireland;The University of Waikato, New Zealand;The University of Waikato, New Zealand
Venue:
Proceedings of the 9th ACM SIGCHI New Zealand Chapter's International Conference on Human-Computer Interaction: Design Centered HCI
Year:
2008

Citing 12
Cited 2

Readings in speech recognition

Readings in speech recognition
Feedback strategies for error correction in speech recognition systems

International Journal of Man-Machine Studies
Fundamentals of speech recognition

Fundamentals of speech recognition
Automatic speech recognition for generalised time based media retrieval and indexing

MULTIMEDIA '98 Proceedings of the sixth ACM international conference on Multimedia
An overview of audio information retrieval

Multimedia Systems - Special issue on audio and multimedia
Patterns of entry and correction in large vocabulary continuous speech recognition systems

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Multimodal error correction for speech user interfaces

ACM Transactions on Computer-Human Interaction (TOCHI)
Transcriber: Development and use of a tool for assisting speech corpora production

Speech Communication - Special issue on speech annotation and corpus tools
SCANMail: a voicemail interface that makes speech browsable, readable and searchable

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Speech and Language Processing for Multimodal Human-Computer Interaction

Journal of VLSI Signal Processing Systems
A system for dynamic 3D visualisation of speech recognition paths

AVI '08 Proceedings of the working conference on Advanced visual interfaces
Browsing recorded meetings with ferret

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction

Interface design strategies for computer-assisted speech transcription

Proceedings of the 20th Australasian Conference on Computer-Human Interaction: Designing for Habitus and Habitat
A prototype for interactive speech transcription balancing error and supervision effort

Proceedings of the 2012 ACM international conference on Intelligent User Interfaces

Quantified Score

Hi-index	0.00

Visualization

Abstract

As performance gains in automatic speech recognition systems plateau, improvements to existing applications of speech recognition technology seem more likely to come from better user interface design than from further progress in core recognition components. Among all applications of speech recognition, the usability of systems for transcription of spontaneous speech is particularly sensitive to high word error rates. This paper presents a series of approaches to improving the usability of such applications. We propose new mechanisms for error correction, use of contextual information, and use of 3D visualisation techniques to improve user interaction with a recogniser and maximise the impact of user feedback. These proposals are illustrated through several prototypes which target tasks such as: off-line transcript editing, dynamic transcript editing, and real-time visualisation of recognition paths. An evaluation of our dynamic transcript editing system demonstrates the gains that can be made by adding the corrected words to the recogniser's dictionary and then propagating the user's corrections.