Expressive richness: a comparison of speech and text as media for revision
CHI '91 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Working with audio: integrating personal tape recorders and desktop computers
CHI '92 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Capturing, structuring, and representing ubiquitous audio
ACM Transactions on Information Systems (TOIS)
FILOCHAT: handwritten notes provide access to recorded conversations
CHI '94 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Retrieving spoken documents by combining multiple index sources
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
SpeechSkimmer: a system for interactively skimming recorded speech
ACM Transactions on Computer-Human Interaction (TOCHI) - Special issue on speech as data
Informedia: news-on-demand multimedia information acquisition and retrieval
Intelligent multimedia information retrieval
All talk and all action: strategies for managing voicemail messages
CHI 98 Cconference Summary on Human Factors in Computing Systems
SCAN: designing and evaluating user interfaces to support retrieval from speech archives
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Jotmail: a voicemail interface that enables you to see what was said
Proceedings of the SIGCHI conference on Human Factors in Computing Systems
The audio notebook: paper and pen interaction with structured speech
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
SCANMail: a voicemail interface that makes speech browsable, readable and searchable
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Four Paradigms for Indexing Video Conferences
IEEE MultiMedia
SCANMail: audio navigation in the voicemail domain
HLT '01 Proceedings of the first international conference on Human language technology research
Time is of the essence: an evaluation of temporal compression algorithms
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Error correction of voicemail transcripts in SCANMail
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Searching in audio: the utility of transcripts, dichotic presentation, and time-compression
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Accessing speech data using strategic fixation
Computer Speech and Language
Design and evaluation of systems to support interaction capture and retrieval
Personal and Ubiquitous Computing - Special Issue: User-centred design and evaluation of ubiquitous groupware
Visualizations: speech, language & autistic spectrum disorder
CHI '08 Extended Abstracts on Human Factors in Computing Systems
Markup as you talk: establishing effective memory cues while still contributing to a meeting
Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work
Content-based tools for editing audio stories
Proceedings of the 26th annual ACM symposium on User interface software and technology
Hi-index | 0.01 |
Editing speech data is currently time-consuming and error-prone. Speech editors rely on acoustic waveform representations, which force users to repeatedly sample the underlying speech to identify words and phrases to edit. Instead we developed a semantic editor that reduces the need for extensive sampling by providing access to meaning. The editor shows a time-aligned errorful transcript produced by applying automatic speech recognition (ASR) to the original speech. Users visually scan the words in the transcript to identify important phrases. They then edit the transcript directly using standard word processing 'cut and paste' operations, which extract the corresponding time-aligned speech. ASR errors mean that users must supplement what they read in the transcript by accessing the original speech. Even when there are transcript errors, however, the semantic representation still provides users with enough information to target what they edit and play, reducing the need for extensive sampling. A laboratory evaluation showed that semantic editing is more efficient than acoustic editing even when ASR is highly inaccurate.